###  Pandas = Python Data Analysis Library

It is a powerful tool for working with structured data (like tables in Excel or SQL).
Built on top of NumPy, but adds tools for handling rows & columns easily.

You‚Äôll use Pandas for:
‚úÖ Data cleaning
‚úÖ Data analysis
‚úÖ Data manipulation (filter, group, merge, etc.)
‚úÖ Reading/writing files (CSV, Excel, SQL, etc.)

| **Method**                       | **What It Does**                                          |
| -------------------------------- | --------------------------------------------------------- |
| `pd.DataFrame(data)`             | Create a DataFrame from data (list, dict, array, etc.).   |
| `df.set_index('column')`         | Set a column as the row index.                            |
| `df.loc[row, col]`               | Access rows and columns by labels.                        |
| `df.iloc[row, col]`              | Access rows and columns by index positions.               |
| `df['new_column'] = ...`         | Add a new column to the DataFrame.                        |
| `df.sum(axis=0)`                 | Compute column-wise sum (default).                        |
| `df.sum(axis=1)`                 | Compute row-wise sum.                                     |
| `df.fillna(value)`               | Replace all NaN (missing values) with a specified value.  |
| `df.fillna(df.mean())`           | Replace NaN with the column mean.                         |
| `df.dropna()`                    | Drop rows with any NaN values.                            |
| `df.groupby('col')`              | Group data by unique values in a column.                  |
| `df.groupby('col')['val'].sum()` | Group by column and compute sum of another column.        |
| `df.groupby(['A', 'B'])`         | Perform grouping on multiple columns (multi-level group). |
| `pd.merge(df1, df2, on='col')`   | Merge two DataFrames using a common column.               |
| `pd.concat([df1, df2], axis=0)`  | Concatenate DataFrames vertically (row-wise).             |
| `pd.concat([df1, df2], axis=1)`  | Concatenate DataFrames horizontally (column-wise).        |
| `pd.date_range(start, end)`      | Create a date range (useful for datetime index).          |
| `df.resample('M').mean()`        | Resample data (e.g., monthly) and compute mean.           |
| `df.rolling(window=7).mean()`    | Compute a rolling (moving) average over a window size.    |
| `pd.MultiIndex.from_tuples()`    | Create a multi-level (hierarchical) index.                |
| `df.loc[('A', 'X')]`             | Access data from a MultiIndex DataFrame.                  |
| `df.sum(level='Category')`       | Sum over one level in a MultiIndex DataFrame.             |
| `pd.pivot_table()`               | Create pivot tables for summarizing data.                 |
| `df.apply(func)`                 | Apply a function row-wise or column-wise.                 |
| `df.applymap(func)`              | Apply a function element-wise on all DataFrame values.    |
| `series.str.upper()`             | Convert all strings in a Series to uppercase.             |
| `series.str[:3]`                 | Slice the first 3 characters of each string in a Series.  |


In [2]:
import pandas as pd

###  A Series is:
‚úÖ A one-dimensional labeled array in Pandas.
‚úÖ It holds data values and an index (labels for those values).
‚úÖ Think of it as a single column of an Excel sheet.

In [4]:
lst=[1,2,4,6,8]

series=pd.Series(lst)
print(series)
print(type(series))

0    1
1    2
2    4
3    6
4    8
dtype: int64
<class 'pandas.core.series.Series'>


In [5]:
## series from dictionary
data={'a':123,'b':980,'c':567}

series_dict=pd.Series(data)
print(series_dict)

a    123
b    980
c    567
dtype: int64


In [7]:
data=[123,456,789,975,421,909]
idx=['a','b','c','d','e','f']

series_data=pd.Series(data,index=idx)
print(series_data)

a    123
b    456
c    789
d    975
e    421
f    909
dtype: int64


####  What is a DataFrame?
A DataFrame is:
‚úÖ A 2D table in Pandas (rows and columns)
‚úÖ Like an Excel sheet or a SQL table
‚úÖ Built using Series (each column is a Series)

It‚Äôs the most powerful structure in Pandas for data analysis.

In [8]:
## create dataframe from dictionary of lists
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Paris', 'London']
}

df=pd.DataFrame(data)
print(df)
print(type(df))

      Name  Age      City
0    Alice   25  New York
1      Bob   30     Paris
2  Charlie   35    London
<class 'pandas.core.frame.DataFrame'>


In [3]:
## create dataframe from list dictionarys
data=[
    {'name':"susovan","age":23,"city":"kolkata"},
    {'name':"pupai","age":23,"city":"kolkata"},
    {'name':"sovan","age":21,"city":"Howrah"},
    {'name':"Mimi","age":20,"city":"kolkata"},
    {'name':"Taniya","age":23,"city":"kolkata"},
    {'name':"Ram","age":24,"city":"Huggly"},
]

df=pd.DataFrame(data)
print(df)
print(type(df))

      name  age     city
0  susovan   23  kolkata
1    pupai   23  kolkata
2    sovan   21   Howrah
3     Mimi   20  kolkata
4   Taniya   23  kolkata
5      Ram   24   Huggly
<class 'pandas.core.frame.DataFrame'>


In [None]:
## access cols
print(df["name"])

0    susovan
1      pupai
2      sovan
3       Mimi
4     Taniya
5        Ram
Name: name, dtype: object


In [4]:
print(df[["name","age"]])
print(df[["age","city"]])

      name  age
0  susovan   23
1    pupai   23
2    sovan   21
3     Mimi   20
4   Taniya   23
5      Ram   24
   age     city
0   23  kolkata
1   23  kolkata
2   21   Howrah
3   20  kolkata
4   23  kolkata
5   24   Huggly


###   df.iloc[] ‚Üí Access by Position
‚úÖ Access rows or cells by integer position (row & column numbers).

###  df.loc[] ‚Üí Access by Labels
‚úÖ Access rows or cells by row labels (index) and column names.

In [11]:
### access row
data=[
    {'name':"susovan","age":23,"city":"Delhi"},
    {'name':"pupai","age":23,"city":"Mumbai"},
    {'name':"sovan","age":21,"city":"Howrah"},
    {'name':"Mimi","age":20,"city":"kolkata"},
    {'name':"Taniya","age":23,"city":"Goa"},
    {'name':"Ram","age":24,"city":"Huggly"},
]
idx=['a','b','c','d','e','f']
df=pd.DataFrame(data,index=idx)
print(df)
print(df.loc['b'])
print(df.loc['c',"city"])


      name  age     city
a  susovan   23    Delhi
b    pupai   23   Mumbai
c    sovan   21   Howrah
d     Mimi   20  kolkata
e   Taniya   23      Goa
f      Ram   24   Huggly
name     pupai
age         23
city    Mumbai
Name: b, dtype: object
Howrah


In [None]:
print(df.loc['c':'f',"age"])  ## work on multiple cells

print(df.iloc[:2,[1,0]])

c    21
d    20
e    23
f    24
Name: age, dtype: int64
   age     name
a   23  susovan
b   23    pupai


In [16]:
df

Unnamed: 0,name,age,city
a,susovan,23,Delhi
b,pupai,23,Mumbai
c,sovan,21,Howrah
d,Mimi,20,kolkata
e,Taniya,23,Goa
f,Ram,24,Huggly


In [None]:
print(df.iloc[0])
print(df.iloc[3,2])

print(df.iloc[1:3,2])  ### work on multiple cells

name    susovan
age          23
city      Delhi
Name: a, dtype: object
kolkata
b    Mumbai
c    Howrah
Name: city, dtype: object


### df.at[]: Access by Labels
Use it when you know the row label and column name.
### df.iat[]: Access by Integer Position
Use it when you know the row number and column number.

In [None]:
df.at['c',"age"]  ## fast data access

np.int64(21)

### axis=0 üëâ operate along rows (DOWN the rows, moving vertically)

### axis=1 üëâ operate along columns (ACROSS the columns, moving horizontally)

In [None]:
 ## fast data access
df.iat[3,0] 

'Mimi'

### df.drop() is used to delete rows or columns from a DataFrame.
‚úÖ It returns a new DataFrame by default (it does NOT change the original DataFrame unless you set inplace=True).

labels  ->	Row or column names (or indexes) to drop

axis    ->	0 = rows (default), 1 = columns

inplace -> False = return new DataFrame (default)
        -> True = modify original DataFrame

In [17]:
df["salary"]=[2000,3000,8000,9000,4000,7000]   ## addd coloumn
df

Unnamed: 0,name,age,city,salary
a,susovan,23,Delhi,2000
b,pupai,23,Mumbai,3000
c,sovan,21,Howrah,8000
d,Mimi,20,kolkata,9000
e,Taniya,23,Goa,4000
f,Ram,24,Huggly,7000


In [18]:
new_df=df.drop("salary",axis=1)  ## not permently delete. return new datafame
new_df

Unnamed: 0,name,age,city
a,susovan,23,Delhi
b,pupai,23,Mumbai
c,sovan,21,Howrah
d,Mimi,20,kolkata
e,Taniya,23,Goa
f,Ram,24,Huggly


In [19]:
new_df

Unnamed: 0,name,age,city
a,susovan,23,Delhi
b,pupai,23,Mumbai
c,sovan,21,Howrah
d,Mimi,20,kolkata
e,Taniya,23,Goa
f,Ram,24,Huggly


In [None]:
df

Unnamed: 0,name,age,city,salary
a,susovan,23,Delhi,2000
b,pupai,23,Mumbai,3000
c,sovan,21,Howrah,8000
d,Mimi,20,kolkata,9000
e,Taniya,23,Goa,4000
f,Ram,24,Huggly,7000


In [None]:
df.drop("salary",axis=1,inplace=True)  # permently delet from current dataframe
df

Unnamed: 0,name,age,city
a,susovan,23,Delhi
b,pupai,23,Mumbai
c,sovan,21,Howrah
d,Mimi,20,kolkata
e,Taniya,23,Goa
f,Ram,24,Huggly


In [None]:
## add 
df["age"]=df["age"]+1
df

Unnamed: 0,name,age,city
a,susovan,24,Delhi
b,pupai,24,Mumbai
c,sovan,22,Howrah
d,Mimi,21,kolkata
e,Taniya,24,Goa
f,Ram,25,Huggly


In [None]:
df.drop('c') ## if want to delete permently do -> inplace=True

Unnamed: 0,name,age,city
a,susovan,24,Delhi
b,pupai,24,Mumbai
d,Mimi,21,kolkata
e,Taniya,24,Goa
f,Ram,25,Huggly


In [None]:
### read csv file

df=pd.read_csv("sales_data.csv")

In [None]:
df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [None]:
df.tail(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [None]:
print("data types : \n\n",df.dtypes)

data types : 

 Transaction ID        int64
Date                 object
Product Category     object
Product Name         object
Units Sold            int64
Unit Price          float64
Total Revenue       float64
Region               object
Payment Method       object
dtype: object


In [None]:
print("statistical Summary: \n\n",df.describe())

statistical Summary: 

        Transaction ID  Units Sold   Unit Price  Total Revenue
count       240.00000  240.000000   240.000000     240.000000
mean      10120.50000    2.158333   236.395583     335.699375
std          69.42622    1.322454   429.446695     485.804469
min       10001.00000    1.000000     6.500000       6.500000
25%       10060.75000    1.000000    29.500000      62.965000
50%       10120.50000    2.000000    89.990000     179.970000
75%       10180.25000    3.000000   249.990000     399.225000
max       10240.00000   10.000000  3899.990000    3899.990000


| **Method**                       | **What It Does**                                          |
| -------------------------------- | --------------------------------------------------------- |
| `pd.DataFrame(data)`             | Create a DataFrame from data (list, dict, array, etc.).   |
| `df.set_index('column')`         | Set a column as the row index.                            |
| `df.loc[row, col]`               | Access rows and columns by labels.                        |
| `df.iloc[row, col]`              | Access rows and columns by index positions.               |
| `df['new_column'] = ...`         | Add a new column to the DataFrame.                        |
| `df.sum(axis=0)`                 | Compute column-wise sum (default).                        |
| `df.sum(axis=1)`                 | Compute row-wise sum.                                     |
| `df.fillna(value)`               | Replace all NaN (missing values) with a specified value.  |
| `df.fillna(df.mean())`           | Replace NaN with the column mean.                         |
| `df.dropna()`                    | Drop rows with any NaN values.                            |
| `df.groupby('col')`              | Group data by unique values in a column.                  |
| `df.groupby('col')['val'].sum()` | Group by column and compute sum of another column.        |
| `df.groupby(['A', 'B'])`         | Perform grouping on multiple columns (multi-level group). |
| `pd.merge(df1, df2, on='col')`   | Merge two DataFrames using a common column.               |
| `pd.concat([df1, df2], axis=0)`  | Concatenate DataFrames vertically (row-wise).             |
| `pd.concat([df1, df2], axis=1)`  | Concatenate DataFrames horizontally (column-wise).        |
| `pd.date_range(start, end)`      | Create a date range (useful for datetime index).          |
| `df.resample('M').mean()`        | Resample data (e.g., monthly) and compute mean.           |
| `df.rolling(window=7).mean()`    | Compute a rolling (moving) average over a window size.    |
| `pd.MultiIndex.from_tuples()`    | Create a multi-level (hierarchical) index.                |
| `df.loc[('A', 'X')]`             | Access data from a MultiIndex DataFrame.                  |
| `df.sum(level='Category')`       | Sum over one level in a MultiIndex DataFrame.             |
| `pd.pivot_table()`               | Create pivot tables for summarizing data.                 |
| `df.apply(func)`                 | Apply a function row-wise or column-wise.                 |
| `df.applymap(func)`              | Apply a function element-wise on all DataFrame values.    |
| `series.str.upper()`             | Convert all strings in a Series to uppercase.             |
| `series.str[:3]`                 | Slice the first 3 characters of each string in a Series.  |


### ‚úÖ Key Pandas Concepts from Your Assignments

### üìÑ Assignment 1: DataFrame Creation and Indexing
pd.DataFrame(data) ‚Üí Creates a DataFrame.

df.set_index('column_name') ‚Üí Sets a column as index.

df.loc[row_label, column_label] ‚Üí Access data by row and column labels.

### üìÑ Assignment 2: DataFrame Operations
df['new_column'] = df['A'] * df['B'] ‚Üí Add new column from operations.

df.sum(axis=0) ‚Üí Column-wise sum.

df.sum(axis=1) ‚Üí Row-wise sum.

### üßπ Assignment 3: Data Cleaning
df.fillna(value) ‚Üí Fill NaN with specified value.

df.fillna(df.mean()) ‚Üí Fill NaN with column mean.

df.dropna() ‚Üí Drops rows with any NaN values.

### üìä Assignment 4: Data Aggregation
df.groupby('Category')['Value'].sum() ‚Üí Group by and aggregate.

df.groupby('Category')['Value'].mean() ‚Üí Mean per group.

df.groupby(['Category', 'SubCategory']) ‚Üí Multi-level grouping.

### üîó Assignment 5: Merging DataFrames
pd.merge(df1, df2, on='common_column') ‚Üí Merge on common column.

pd.concat([df1, df2], axis=0) ‚Üí Concatenate along rows.

pd.concat([df1, df2], axis=1) ‚Üí Concatenate along columns.

### üïë Assignment 6: Time Series Analysis
pd.date_range(start, end) ‚Üí Create date index.

df.resample('M').mean() ‚Üí Resample to monthly mean.

df.rolling(window=7).mean() ‚Üí Rolling window of 7 days.

### üîó Assignment 7: MultiIndex DataFrame
pd.MultiIndex.from_tuples() ‚Üí Create multi-level index.

df.loc[('A', 'X')] ‚Üí Access MultiIndex data.

df.sum(level='Category') ‚Üí Sum over MultiIndex level.

### üìå Assignment 8: Pivot Tables
pd.pivot_table(df, values='Value', index='Date', columns='Category', aggfunc='sum') ‚Üí Create pivot tables.

‚öôÔ∏è Assignment 9: Applying Functions
df.apply(func) ‚Üí Apply function to DataFrame.

df.applymap(func) ‚Üí Apply function element-wise.

df['new'] = df.apply(lambda row: row.sum(), axis=1) ‚Üí Row-wise operation.

### ‚úèÔ∏è Assignment 10: Working with Text Data
series.str.upper() ‚Üí Convert text to uppercase.

series.str[:3] ‚Üí Slice first 3 characters of each stri