#### Pandas-DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [1]:
import pandas as pd

# series

data=[1,2,3,4,5]
series=pd.Series(data)
print(series)
print(type(series))

0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [2]:
# Create a series from dictionary

data={'a':1,'b':2,'c':3}
series_dict=pd.Series(data)
print(series_dict)

a    1
b    2
c    3
dtype: int64


In [3]:
data=[10,20,30]
index=['a','b','c']
pd.Series(data,index=index)

a    10
b    20
c    30
dtype: int64

In [4]:
## Dataframe
## create a Dataframe from a dictionary of list

data={
    'Name':['Dhrupad','Dhruv','Mithila'],
    'Age':[23,32,27],
    'City':['Kolkata','Bangalore','Pune']
}
df=pd.DataFrame(data)
print(df)
print(type(df))

      Name  Age       City
0  Dhrupad   23    Kolkata
1    Dhruv   32  Bangalore
2  Mithila   27       Pune
<class 'pandas.core.frame.DataFrame'>


In [5]:
df=pd.read_csv('sales_data.csv')
df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [6]:
df.tail(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [7]:
data={
    'Name':['Dhrupad','Dhruv','Mithila'],
    'Age':[23,32,27],
    'City':['Kolkata','Bangalore','Pune']
}
df=pd.DataFrame(data)

In [8]:
df['Name']

0    Dhrupad
1      Dhruv
2    Mithila
Name: Name, dtype: object

In [9]:
df.loc[0]

Name    Dhrupad
Age          23
City    Kolkata
Name: 0, dtype: object

In [10]:
df.iloc[0]

Name    Dhrupad
Age          23
City    Kolkata
Name: 0, dtype: object

In [11]:
## Adding a column

df['Salary']=[46000,60000,70000]
df

Unnamed: 0,Name,Age,City,Salary
0,Dhrupad,23,Kolkata,46000
1,Dhruv,32,Bangalore,60000
2,Mithila,27,Pune,70000


In [12]:
# Remove a column
df.drop('Salary', axis=1,inplace=True)
df

Unnamed: 0,Name,Age,City
0,Dhrupad,23,Kolkata
1,Dhruv,32,Bangalore
2,Mithila,27,Pune


In [13]:
df['Age']=df['Age']+1
df

Unnamed: 0,Name,Age,City
0,Dhrupad,24,Kolkata
1,Dhruv,33,Bangalore
2,Mithila,28,Pune


In [14]:
df=pd.read_csv('sales_data.csv')
df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [16]:
# Display data types of each column
print(df.dtypes)

Transaction ID        int64
Date                 object
Product Category     object
Product Name         object
Units Sold            int64
Unit Price          float64
Total Revenue       float64
Region               object
Payment Method       object
dtype: object


In [17]:
# Describe the dataframe
print(df.describe())

       Transaction ID  Units Sold   Unit Price  Total Revenue
count       240.00000  240.000000   240.000000     240.000000
mean      10120.50000    2.158333   236.395583     335.699375
std          69.42622    1.322454   429.446695     485.804469
min       10001.00000    1.000000     6.500000       6.500000
25%       10060.75000    1.000000    29.500000      62.965000
50%       10120.50000    2.000000    89.990000     179.970000
75%       10180.25000    3.000000   249.990000     399.225000
max       10240.00000   10.000000  3899.990000    3899.990000
