### Pandas Dataframe and series
Pandas is powerful data manipulation library in python, widely used for data analysis and data cleaning. It provides two primary data structures : series and dataframe. A series is one dimensional Array like object, while dataframe is 2 dimensional , size mutable, and potentially hetrogeneous tabular data structure with labeled axes(rows and columns)

#### Series


In [34]:
import pandas as pd
data =[1,2,3,4,5]
series=pd.Series(data)
print(series)
print(type(series))

0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


#### Series from dictionary 

In [35]:

data={'a':1,'b':2,'c':3}
series_dic=pd.Series(data)
#Key will become index
print(series_dic)

a    1
b    2
c    3
dtype: int64


#### using data and index

In [36]:

data=[1,2,3]
index=['a','b','c']
print(pd.Series(data,index=index))

a    1
b    2
c    3
dtype: int64


#### DataFrame


In [37]:
#Create dataframe from dictionary
data={
    "Name":["Jhon","Mike","Aditya"],
    "Age":[23,45,25],
    "City": ["Alld","Banda","Kanpur"]

}
df=pd.DataFrame(data)
print(type(df))
print(df)

<class 'pandas.core.frame.DataFrame'>
     Name  Age    City
0    Jhon   23    Alld
1    Mike   45   Banda
2  Aditya   25  Kanpur


#### Create Dataframe from list of dictionaries

In [38]:

data=[
    {"Name":"Aditya","Age":29,"City":"Banglore"},
    {"Name":"Shobhita","Age":30,"City":"Delhi"},
    {"Name":"David","Age":29,"City":"Kanpur"}
]
df=pd.DataFrame(data)
print(df)

       Name  Age      City
0    Aditya   29  Banglore
1  Shobhita   30     Delhi
2     David   29    Kanpur


In [39]:
df=pd.read_csv("sales_data_sample.csv")
df

df.tail(4)
df.head(5)

Unnamed: 0,Order ID,Date,Customer Name,Product,Category,Quantity,Unit Price,Total
0,1001,2025-06-01,Anita Sharma,Apple iPhone 14,Electronics,1,79900,79900
1,1002,2025-06-01,Raj Malhotra,Noise Smartwatch,Fashion,2,2999,5998
2,1003,2025-06-02,Seema Yadav,Levi's Jeans,Clothing,1,2999,2999
3,1004,2025-06-02,Vikram Singh,Dell Laptop,Electronics,1,58999,58999
4,1005,2025-06-03,Preeti Mehra,Lakme Lipstick,Beauty,3,450,1350


In [40]:
print(type(df['Order ID']))
df['Order ID'] #When we take on column it becomes series


<class 'pandas.core.series.Series'>


0    1001
1    1002
2    1003
3    1004
4    1005
5    1006
6    1007
7    1008
8    1009
9    1010
Name: Order ID, dtype: int64

#### Using Loc 

In [59]:
#Using Loc -- For row-- Based on numeric index if not given
# or index value or range if given
#Printing complete first row
print(df.loc[1])

#To get value as and native datatype
print(df.loc[1] ['Customer Name'])
print(type(df.loc[1] ['Customer Name']))


#Printing customer name from first as dataframe
print(df.loc[1,['Customer Name']])
print(type(df.loc[1,['Customer Name']]))

#Using slicing to get multiple rows as DF
print(df.loc[1:3,['Customer Name',"Order ID"]])

# U can not do slicing here- Can be done with iloc
#df.loc[1:3][3:2]


#Second argument need to be column Name only
print(df.loc[1:5,["Order ID"]]) #works- 


Order ID                     1002
Date                   2025-06-01
Customer Name        Raj Malhotra
Product          Noise Smartwatch
Category                  Fashion
Quantity                        2
Unit Price                   2999
Total                        5998
Name: 1, dtype: object
Raj Malhotra
<class 'str'>
Customer Name    Raj Malhotra
Name: 1, dtype: object
<class 'pandas.core.series.Series'>
  Customer Name  Order ID
1  Raj Malhotra      1002
2   Seema Yadav      1003
3  Vikram Singh      1004
   Order ID
1      1002
2      1003
3      1004
4      1005
5      1006


#### iloc

In [9]:
# For Row- iloc-- Using index given by pandas-faster
df.iloc[0:4,1:4]

Unnamed: 0,Date,Customer Name,Product
0,2025-06-01,Anita Sharma,Apple iPhone 14
1,2025-06-01,Raj Malhotra,Noise Smartwatch
2,2025-06-02,Seema Yadav,Levi's Jeans
3,2025-06-02,Vikram Singh,Dell Laptop


#### Accessing Specified element

In [10]:

df.at[3,'Date']

'2025-06-02'

#### using row index and column index

In [11]:

df.iat[2,2]

'Seema Yadav'

#### Data Manipulation with dataset

In [12]:

df['amount_spent']=[500,400,430,453,500,400,430,453,500,400]
df
#Drop- to drop column
#when u give value to drop it by default check for that in row-index axis-0
# to drop column give axis 1 
df.drop("Category",axis=1)

Unnamed: 0,Order ID,Date,Customer Name,Product,Quantity,Unit Price,Total,amount_spent
0,1001,2025-06-01,Anita Sharma,Apple iPhone 14,1,79900,79900,500
1,1002,2025-06-01,Raj Malhotra,Noise Smartwatch,2,2999,5998,400
2,1003,2025-06-02,Seema Yadav,Levi's Jeans,1,2999,2999,430
3,1004,2025-06-02,Vikram Singh,Dell Laptop,1,58999,58999,453
4,1005,2025-06-03,Preeti Mehra,Lakme Lipstick,3,450,1350,500
5,1006,2025-06-03,Suresh Kumar,Nike Running Shoes,2,5999,11998,400
6,1007,2025-06-04,Kavita Joshi,Wooden Dining Table,1,24999,24999,430
7,1008,2025-06-04,Amit Patel,HP Printer,1,7499,7499,453
8,1009,2025-06-05,Ritika Gupta,Boat Bluetooth Speaker,2,1999,3998,500
9,1010,2025-06-05,Anil Kapoor,Puma T-Shirt,4,899,3596,400


##### If I give DF here category comes back

In [13]:

# Drop is temp operation and do not save state of df

# To save state inplace=true
df.drop("Category",axis=1,inplace=True)

In [14]:
df #No category now

Unnamed: 0,Order ID,Date,Customer Name,Product,Quantity,Unit Price,Total,amount_spent
0,1001,2025-06-01,Anita Sharma,Apple iPhone 14,1,79900,79900,500
1,1002,2025-06-01,Raj Malhotra,Noise Smartwatch,2,2999,5998,400
2,1003,2025-06-02,Seema Yadav,Levi's Jeans,1,2999,2999,430
3,1004,2025-06-02,Vikram Singh,Dell Laptop,1,58999,58999,453
4,1005,2025-06-03,Preeti Mehra,Lakme Lipstick,3,450,1350,500
5,1006,2025-06-03,Suresh Kumar,Nike Running Shoes,2,5999,11998,400
6,1007,2025-06-04,Kavita Joshi,Wooden Dining Table,1,24999,24999,430
7,1008,2025-06-04,Amit Patel,HP Printer,1,7499,7499,453
8,1009,2025-06-05,Ritika Gupta,Boat Bluetooth Speaker,2,1999,3998,500
9,1010,2025-06-05,Anil Kapoor,Puma T-Shirt,4,899,3596,400


#### Add 1 to order ID

In [15]:

df["Order ID"]=df["Order ID"]+1
df



Unnamed: 0,Order ID,Date,Customer Name,Product,Quantity,Unit Price,Total,amount_spent
0,1002,2025-06-01,Anita Sharma,Apple iPhone 14,1,79900,79900,500
1,1003,2025-06-01,Raj Malhotra,Noise Smartwatch,2,2999,5998,400
2,1004,2025-06-02,Seema Yadav,Levi's Jeans,1,2999,2999,430
3,1005,2025-06-02,Vikram Singh,Dell Laptop,1,58999,58999,453
4,1006,2025-06-03,Preeti Mehra,Lakme Lipstick,3,450,1350,500
5,1007,2025-06-03,Suresh Kumar,Nike Running Shoes,2,5999,11998,400
6,1008,2025-06-04,Kavita Joshi,Wooden Dining Table,1,24999,24999,430
7,1009,2025-06-04,Amit Patel,HP Printer,1,7499,7499,453
8,1010,2025-06-05,Ritika Gupta,Boat Bluetooth Speaker,2,1999,3998,500
9,1011,2025-06-05,Anil Kapoor,Puma T-Shirt,4,899,3596,400


#### drop row-On basis of index

In [16]:

df.drop(7)

Unnamed: 0,Order ID,Date,Customer Name,Product,Quantity,Unit Price,Total,amount_spent
0,1002,2025-06-01,Anita Sharma,Apple iPhone 14,1,79900,79900,500
1,1003,2025-06-01,Raj Malhotra,Noise Smartwatch,2,2999,5998,400
2,1004,2025-06-02,Seema Yadav,Levi's Jeans,1,2999,2999,430
3,1005,2025-06-02,Vikram Singh,Dell Laptop,1,58999,58999,453
4,1006,2025-06-03,Preeti Mehra,Lakme Lipstick,3,450,1350,500
5,1007,2025-06-03,Suresh Kumar,Nike Running Shoes,2,5999,11998,400
6,1008,2025-06-04,Kavita Joshi,Wooden Dining Table,1,24999,24999,430
8,1010,2025-06-05,Ritika Gupta,Boat Bluetooth Speaker,2,1999,3998,500
9,1011,2025-06-05,Anil Kapoor,Puma T-Shirt,4,899,3596,400


#### Print Data types of each columns

In [17]:

print("Data Types \n",df.dtypes)

# Descibe the dataFrame

print("Statical Summary : \n",df.describe())

df=df.describe()
df.at["std","Quantity"]

Data Types 
 Order ID          int64
Date             object
Customer Name    object
Product          object
Quantity          int64
Unit Price        int64
Total             int64
amount_spent      int64
dtype: object
Statical Summary : 
          Order ID   Quantity    Unit Price         Total  amount_spent
count    10.00000  10.000000     10.000000     10.000000     10.000000
mean   1006.50000   1.800000  18674.200000  20133.600000    446.600000
std       3.02765   1.032796  28122.356175  27309.172741     41.769739
min    1002.00000   1.000000    450.000000   1350.000000    400.000000
25%    1004.25000   1.000000   2249.000000   3696.500000    407.500000
50%    1006.50000   1.500000   4499.000000   6748.500000    441.500000
75%    1008.75000   2.000000  20624.000000  21748.750000    488.250000
max    1011.00000   4.000000  79900.000000  79900.000000    500.000000


np.float64(1.0327955589886446)