## Pandas 📊
### Definition
Pandas is a powerful Python library used for data manipulation and analysis. It provides two main data structures:

- Series → 1D labeled array

- DataFrame → 2D labeled table of rows and columns

### Why Pandas?

Easy to handle large datasets

Supports multiple file formats (CSV, Excel, SQL, JSON, etc.)

Built-in methods for filtering, grouping, merging, reshaping data

In [16]:
#Import pandas
import pandas as pd


## Creating a Series

In [17]:
import pandas as pd

# From a list
data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)


a    10
b    20
c    30
d    40
dtype: int64


## Creating a DataFrame

In [19]:
# From dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['Delhi', 'Mumbai', 'Bangalore']
}
df = pd.DataFrame(data)
print(df)


      Name  Age       City
0    Alice   25      Delhi
1      Bob   30     Mumbai
2  Charlie   35  Bangalore


## Reading & Writing Files

In [21]:
# Reading CSV file
df = pd.read_csv("D:\DATASETS-BI\Details.csv")
print(df)

# Writing CSV file
df.to_csv('output.csv', index=False)


     Order ID  Amount  Profit  Quantity     Category      Sub-Category  \
0     B-25681    1096     658         7  Electronics  Electronic Games   
1     B-26055    5729      64        14    Furniture            Chairs   
2     B-25955    2927     146         8    Furniture         Bookcases   
3     B-26093    2847     712         8  Electronics          Printers   
4     B-25602    2617    1151         4  Electronics            Phones   
...       ...     ...     ...       ...          ...               ...   
1495  B-25700       7      -3         2     Clothing       Hankerchief   
1496  B-25757    3151     -35         7     Clothing          Trousers   
1497  B-25973    4141    1698        13  Electronics          Printers   
1498  B-25698       7      -2         1     Clothing       Hankerchief   
1499  B-25993    4363     305         5    Furniture            Tables   

      PaymentMode  
0             COD  
1             EMI  
2             EMI  
3     Credit Card  
4     Credi

## Inbuilt pandas Function


| Function        | Description              |
| --------------- | ------------------------ |
| `head()`        | Show first rows          |
| `tail()`        | Show last rows           |
| `info()`        | Summary of DataFrame     |
| `describe()`    | Statistics summary       |
| `shape`         | Number of rows & columns |
| `drop()`        | Remove rows/columns      |
| `sort_values()` | Sort data                |
| `groupby()`     | Group data by a column   |


In [None]:
# Display first rows
print(df.head())

# Display column
print(df['PaymentMode'])

  Order ID  Amount  Profit  Quantity     Category      Sub-Category  \
0  B-25681    1096     658         7  Electronics  Electronic Games   
1  B-26055    5729      64        14    Furniture            Chairs   
2  B-25955    2927     146         8    Furniture         Bookcases   
3  B-26093    2847     712         8  Electronics          Printers   
4  B-25602    2617    1151         4  Electronics            Phones   

   PaymentMode  
0          COD  
1          EMI  
2          EMI  
3  Credit Card  
4  Credit Card  
0               COD
1               EMI
2               EMI
3       Credit Card
4       Credit Card
           ...     
1495            COD
1496            EMI
1497            COD
1498            COD
1499            EMI
Name: PaymentMode, Length: 1500, dtype: object


In [10]:
# Filter data
print(df[df['Amount'] > 2000])


     Order ID  Amount  Profit  Quantity     Category      Sub-Category  \
1     B-26055    5729      64        14    Furniture            Chairs   
2     B-25955    2927     146         8    Furniture         Bookcases   
3     B-26093    2847     712         8  Electronics          Printers   
4     B-25602    2617    1151         4  Electronics            Phones   
5     B-25881    2244     247         4     Clothing          Trousers   
11    B-25887    2125    -234         6  Electronics          Printers   
12    B-25923    3873    -891         6  Electronics            Phones   
14    B-25761    2188    1050         5    Furniture         Bookcases   
18    B-25853    2093     721         5    Furniture            Chairs   
1457  B-26099    2366     552         5     Clothing          Trousers   
1480  B-25862    2061     701         5    Furniture         Bookcases   
1482  B-25823    2103     322         8  Electronics  Electronic Games   
1483  B-25881    2115      23         

In [12]:
print(df.info())            # Structure of data
print(df.describe())        # Stats summary
print(df.shape)             # Dimensions
print(df.sort_values(by='Amount'))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1500 entries, 0 to 1499
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Order ID      1500 non-null   object
 1   Amount        1500 non-null   int64 
 2   Profit        1500 non-null   int64 
 3   Quantity      1500 non-null   int64 
 4   Category      1500 non-null   object
 5   Sub-Category  1500 non-null   object
 6   PaymentMode   1500 non-null   object
dtypes: int64(3), object(4)
memory usage: 82.2+ KB
None
            Amount      Profit     Quantity
count  1500.000000  1500.00000  1500.000000
mean    291.847333    24.64200     3.743333
std     461.924620   168.55881     2.184942
min       4.000000 -1981.00000     1.000000
25%      47.750000   -12.00000     2.000000
50%     122.000000     8.00000     3.000000
75%     326.250000    38.00000     5.000000
max    5729.000000  1864.00000    14.000000
(1500, 7)
     Order ID  Amount  Profit  Quantity     Category S

## Start Initially

In [22]:
## First step is to import pandas

import pandas as pd
import numpy as np

In [23]:
## Dataframe

df=pd.DataFrame(np.arange(0,20).reshape(5,4),index=['Row1','Row2','Row3','Row4','Row5'],columns=["Column1","Column2","Column3","Coumn4"])
#(data,index,columns,dtype)

In [24]:
df.head()


Unnamed: 0,Column1,Column2,Column3,Coumn4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [25]:
df.to_csv('Test.csv')

In [26]:
## Accessing the elements 2 ways
# 1-> loc 2-> iloc(index location)

df.loc['Row1']

Column1    0
Column2    1
Column3    2
Coumn4     3
Name: Row1, dtype: int64

In [27]:
## Check the type

type(df.loc['Row1'])

pandas.core.series.Series

In [28]:
## Take the elements from the Column2
df.iloc[:,1:]

Unnamed: 0,Column2,Column3,Coumn4
Row1,1,2,3
Row2,5,6,7
Row3,9,10,11
Row4,13,14,15
Row5,17,18,19


In [29]:
type(df.iloc[:,1:])

pandas.core.frame.DataFrame

In [32]:
#convert Dataframes into array
df.iloc[:,1:].values

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11],
       [13, 14, 15],
       [17, 18, 19]])

In [34]:
df.iloc[:,1:].values.shape

(5, 3)

In [35]:
df['Column1'].value_counts()

Column1
0     1
4     1
8     1
12    1
16    1
Name: count, dtype: int64

In [36]:
df.isnull().sum()

Column1    0
Column2    0
Column3    0
Coumn4     0
dtype: int64