## Pandas-DataFrame and Series

Pandas is a powerful open-source library in Python used for data manipulation and analysis. It provides two primary data structures: DataFrame (for tabular data) and Series (for one-dimensional data). Pandas makes it easy to clean, filter, aggregate, and visualize data, and is widely used in data science and machine learning workflows.

In [1]:
!pip install pandas



In [2]:

import pandas as pd

In [5]:
## Series
## A pandas series is a one-dimensional array like object that can hold any data type. It is similar to 1D array...

data = [1,2,3,4,5]
series = pd.Series(data)
print("Series \n", series)

Series 
 0    1
1    2
2    3
3    4
4    5
dtype: int64


In [7]:
## Create a series from dictionary

data = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

series = pd.Series(data)
print(series)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [10]:
data = [10, 20, 30, 40, 50]
index = ['a', 'b', 'c', 'd', 'e']
pd.Series(data, index=index)

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [14]:
## Dataframe
## Create a dataframe from a dicitonay of list

data = {
    'Name': ['chandan', 'nikhil', 'abhay'],
    'age': [24, 25, 26],
    'city': ['Delhi', 'gaya', 'patna']
}

df = pd.DataFrame(data)
print(df)
print(type(df))


      Name  age   city
0  chandan   24  Delhi
1   nikhil   25   gaya
2    abhay   26  patna
<class 'pandas.core.frame.DataFrame'>


In [18]:
## create a dataframe with dictionary

data = [
    {"Name": 'Chandan', 'Age': 24, 'City':'Delhi'},
    {"Name": 'Nikhil', 'Age': 25, 'City':'Delhi'},
    {"Name": 'Abhya', 'Age': 25, 'City':'Delhi'},
    {"Name": 'Taqueer', 'Age': 25, 'City':'Delhi'},
    {"Name": 'Vivek', 'Age': 25, 'City':'Delhi'},
] 

df = pd.DataFrame(data)
print(df)
print(type(df))

      Name  Age   City
0  Chandan   24  Delhi
1   Nikhil   25  Delhi
2    Abhya   25  Delhi
3  Taqueer   25  Delhi
4    Vivek   25  Delhi
<class 'pandas.core.frame.DataFrame'>


In [None]:
## Use csv data

df = pd.read_csv('sales_data.csv')
df.head(8)  ## Top 8 data is show

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal
5,10006,2024-01-06,Sports,Wilson Evolution Basketball,5,29.99,149.95,Asia,Credit Card
6,10007,2024-01-07,Electronics,MacBook Pro 16-inch,1,2499.99,2499.99,North America,Credit Card
7,10008,2024-01-08,Home Appliances,Blueair Classic 480i,2,599.99,1199.98,Europe,PayPal


In [23]:
## Last 5 data is show
df.tail(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [27]:
## Access data frame element

data

df = pd.DataFrame(data)
df


Unnamed: 0,Name,Age,City
0,Chandan,24,Delhi
1,Nikhil,25,Delhi
2,Abhya,25,Delhi
3,Taqueer,25,Delhi
4,Vivek,25,Delhi


In [29]:
df.loc[0]

Name    Chandan
Age          24
City      Delhi
Name: 0, dtype: object

In [32]:
df.iloc[3]  ## index location

Name    Taqueer
Age          25
City      Delhi
Name: 3, dtype: object

In [35]:
## Accessing a specified element
df.at[1, 'Age']

np.int64(25)

In [None]:
print(df.at[0, 'Name'])

## Add one column
df['Salary'] = [1000, 2000, 3000, 4000, 5000]
df

Chandan


Unnamed: 0,Name,Age,City,Salary
0,Chandan,24,Delhi,1000
1,Nikhil,25,Delhi,2000
2,Abhya,25,Delhi,3000
3,Taqueer,25,Delhi,4000
4,Vivek,25,Delhi,5000


In [41]:
## Remove a column
df.drop('Salary', axis=1, inplace=True)

KeyError: "['Salary'] not found in axis"

In [43]:
df

Unnamed: 0,Name,Age,City
0,Chandan,24,Delhi
1,Nikhil,25,Delhi
2,Abhya,25,Delhi
3,Taqueer,25,Delhi
4,Vivek,25,Delhi


In [45]:
## Add age to the column
df['Age'] = df['Age']+1
df

Unnamed: 0,Name,Age,City
0,Chandan,26,Delhi
1,Nikhil,27,Delhi
2,Abhya,27,Delhi
3,Taqueer,27,Delhi
4,Vivek,27,Delhi


In [None]:
df.drop(0) ## temporary

Unnamed: 0,Name,Age,City
1,Nikhil,27,Delhi
2,Abhya,27,Delhi
3,Taqueer,27,Delhi
4,Vivek,27,Delhi


In [51]:
## permenent delete
df.drop(3, inplace=True)


KeyError: '[3] not found in axis'

In [52]:
df

Unnamed: 0,Name,Age,City
0,Chandan,26,Delhi
1,Nikhil,27,Delhi
2,Abhya,27,Delhi
4,Vivek,27,Delhi


In [60]:
df = pd.read_csv('sales_data.csv')
df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [62]:
## Display the data types of each column
print("Data types: \n", df.dtypes)

## Describe the dataframe
print("Statistical summary: \n", df.describe)



Data types: 
 Transaction ID        int64
Date                 object
Product Category     object
Product Name         object
Units Sold            int64
Unit Price          float64
Total Revenue       float64
Region               object
Payment Method       object
dtype: object
Statistical summary: 
 <bound method NDFrame.describe of      Transaction ID        Date Product Category  \
0             10001  2024-01-01      Electronics   
1             10002  2024-01-02  Home Appliances   
2             10003  2024-01-03         Clothing   
3             10004  2024-01-04            Books   
4             10005  2024-01-05  Beauty Products   
..              ...         ...              ...   
235           10236  2024-08-23  Home Appliances   
236           10237  2024-08-24         Clothing   
237           10238  2024-08-25            Books   
238           10239  2024-08-26  Beauty Products   
239           10240  2024-08-27           Sports   

                                      

In [63]:
df.describe()

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue
count,240.0,240.0,240.0,240.0
mean,10120.5,2.158333,236.395583,335.699375
std,69.42622,1.322454,429.446695,485.804469
min,10001.0,1.0,6.5,6.5
25%,10060.75,1.0,29.5,62.965
50%,10120.5,2.0,89.99,179.97
75%,10180.25,3.0,249.99,399.225
max,10240.0,10.0,3899.99,3899.99
