#### Pandas-DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. 
- A Series is a one-dimensional array-like object, 
- while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [1]:
import pandas as pd

In [3]:
## Series
##A Pandas Series is a one-dimensional array-like object that can hold any data type. 
# It is similar to a column in a table.
# The first column in the output, 0, 1, 2, 3, 4, is the index of the Pandas Series

import pandas as pd
data=[1,2,3,4,5]

series=pd.Series(data)

print("Series \n",series)
# print(type(series))

Series 
 0    1
1    2
2    3
3    4
4    5
dtype: int64


In [None]:
## Create a Series from dictionary
data={'a':1,'b':2,'c':3}

series_dict=pd.Series(data)
print(series_dict)


a    1
b    2
c    3
dtype: int64


In [5]:
data=[10,20,30]
index=['a','b','c']
pd.Series(data,index=index)

a    10
b    20
c    30
dtype: int64

In [4]:
## Dataframe
## create a Dataframe from a dictionary oof list

data={
    'Name':['Kapil','John','Jack'],
    'Age':[25,30,45],
    'City':['Bangalore','New York','Florida']
}
df=pd.DataFrame(data)
print(df)
print(type(df))

    Name  Age       City
0  Kapil   25  Bangalore
1   John   30   New York
2   Jack   45    Florida
<class 'pandas.core.frame.DataFrame'>


**Difference between `df.loc` and `df.iloc`**

---

### 1. Selection basis
- **`df.loc`**  
  - Selects by **label** (row and column names).  
  - Includes both start and end of a slice.  
- **`df.iloc`**  
  - Selects by **integer position** (zero-based).  
  - End of a slice is **exclusive**.
---

### 3. Quick takeaways
- Use **`.loc`** when you know the **index/column names**.  
- Use **`.iloc`** when you want purely **positional** selection.

In [14]:
# 1.Sample DataFrame
df = pd.DataFrame({
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}, index=['row1', 'row2', 'row3'])

print(df)

       A   B   C
row1  10  40  70
row2  20  50  80
row3  30  60  90


In [13]:
#### a) Single-element access
# - **By label**:  
# df.loc['row2', 'B']      # → 50

# - **By position**:  
df.iloc[1, 1]            # → 50


50

In [16]:
#### b) Row slicing
# - **Labels** (inclusive):  
# df.loc['row1':'row2']

# - **Positions** (end exclusive):  
df.iloc[0:2]    

Unnamed: 0,A,B,C
row1,10,40,70
row2,20,50,80


In [18]:
#### c) Column slicing
# - **Labels** (inclusive):  
# df.loc[:, 'A':'B']    
# - **Positions** (end exclusive):  
df.iloc[:, 0:2]

Unnamed: 0,A,B
row1,10,40
row2,20,50
row3,30,60


In [26]:
df=pd.read_csv('sales_data.csv')
df.head(3)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card


In [27]:
# `df.tail(3)` shows the last 5 rows of your DataFrame `df`. It's a quick way to inspect the end of your data.
df.tail(3)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [30]:
### Accessing Data From Dataframe
# df

df['Product Name']

0                                        iPhone 14 Pro
1                                     Dyson V11 Vacuum
2                                     Levi's 501 Jeans
3                                    The Da Vinci Code
4                              Neutrogena Skincare Set
                            ...                       
235    Nespresso Vertuo Next Coffee and Espresso Maker
236                          Nike Air Force 1 Sneakers
237             The Handmaid's Tale by Margaret Atwood
238               Sunday Riley Luna Sleeping Night Oil
239                         Yeti Rambler 20 oz Tumbler
Name: Product Name, Length: 240, dtype: object

In [32]:
# `df.loc[0]` fetches the row in your DataFrame `df` that has the index label `0`.
df.loc[0]
# print(type(df.loc[0]))

Transaction ID              10001
Date                   2024-01-01
Product Category      Electronics
Product Name        iPhone 14 Pro
Units Sold                      2
Unit Price                 999.99
Total Revenue             1999.98
Region              North America
Payment Method        Credit Card
Name: 0, dtype: object

In [33]:
# `df.iloc[0]` gives you the first row of the DataFrame `df` as a Pandas Series.
df.iloc[0]
# print(type(df.iloc[0]))

Transaction ID              10001
Date                   2024-01-01
Product Category      Electronics
Product Name        iPhone 14 Pro
Units Sold                      2
Unit Price                 999.99
Total Revenue             1999.98
Region              North America
Payment Method        Credit Card
Name: 0, dtype: object

In [37]:
## Accessing a specified element
df.at[0,'Product Name']

'iPhone 14 Pro'

In [38]:
df.at[2,'Product Category']

'Clothing'

In [39]:
## Accessing a specified element using iat
df.iat[2,2]

'Clothing'

In [41]:
df.head(2)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal


In [53]:
## Create a Data frame From a List of Dictionaries

data=[
    {'Name':'Kapil','Age':32,'City':'Bangalore'},
    {'Name':'John','Age':34,'City':'Bangalore'},
    {'Name':'Bappy','Age':32,'City':'Bangalore'},
    {'Name':'JAck','Age':32,'City':'Bangalore'}
    
]
df=pd.DataFrame(data)
print(df)
print(type(df))

# ✔️ Tip: Always check len(your_list) == len(df) before assignment!
# len(df)

    Name  Age       City
0  Kapil   32  Bangalore
1   John   34  Bangalore
2  Bappy   32  Bangalore
3   JAck   32  Bangalore
<class 'pandas.core.frame.DataFrame'>


4

In [56]:
## Adding a column
salaries = [50000, 60000, 70000, 80000]  # correct
# salaries = [50000, 60000, 70000]   # error
df['Salary'] = salaries
df

Unnamed: 0,Name,Age,City,Salary
0,Kapil,32,Bangalore,50000
1,John,34,Bangalore,60000
2,Bappy,32,Bangalore,70000
3,JAck,32,Bangalore,80000


In [57]:
## Remove a column
df.drop('Salary',axis=1,inplace=True)

In [58]:
df

Unnamed: 0,Name,Age,City
0,Kapil,32,Bangalore
1,John,34,Bangalore
2,Bappy,32,Bangalore
3,JAck,32,Bangalore


In [59]:
## Add age to the column
df['Age']=df['Age']+1
df

Unnamed: 0,Name,Age,City
0,Kapil,33,Bangalore
1,John,35,Bangalore
2,Bappy,33,Bangalore
3,JAck,33,Bangalore


In [60]:
df.drop(0,inplace=True)

In [61]:
df

Unnamed: 0,Name,Age,City
1,John,35,Bangalore
2,Bappy,33,Bangalore
3,JAck,33,Bangalore


In [65]:
df=pd.read_csv('sales_data.csv')
df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [66]:
# Display the data types of each column
# print("Data types:\n", df.dtypes)

# Describe the DataFrame
# `df.describe()` calculates and shows descriptive statistics 
# (like count, mean, standard deviation, min, max, and quartiles) for the **numerical columns** 
# in your DataFrame `df`. It gives you a quick overview of the data's distribution.
print("Statistical summary:\n", df.describe())



Statistical summary:
        Transaction ID  Units Sold   Unit Price  Total Revenue
count       240.00000  240.000000   240.000000     240.000000
mean      10120.50000    2.158333   236.395583     335.699375
std          69.42622    1.322454   429.446695     485.804469
min       10001.00000    1.000000     6.500000       6.500000
25%       10060.75000    1.000000    29.500000      62.965000
50%       10120.50000    2.000000    89.990000     179.970000
75%       10180.25000    3.000000   249.990000     399.225000
max       10240.00000   10.000000  3899.990000    3899.990000


In [67]:
df.describe()

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue
count,240.0,240.0,240.0,240.0
mean,10120.5,2.158333,236.395583,335.699375
std,69.42622,1.322454,429.446695,485.804469
min,10001.0,1.0,6.5,6.5
25%,10060.75,1.0,29.5,62.965
50%,10120.5,2.0,89.99,179.97
75%,10180.25,3.0,249.99,399.225
max,10240.0,10.0,3899.99,3899.99
