## Pandas-DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [53]:
import pandas as pd
import numpy as np

In [54]:
## Series
##A Pandas Series is a one-dimensional array-like object that can hold any data type. It is similar to a column in a table

data = [1,2,3,4,5]
series = pd.Series(data)
print("Series:\n", series)
print(type(series))

Series:
 0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [55]:
#create from dictionary
data = {'a':1, 'b':2, 'c':3}
series_dict = pd.Series(data)
print(series_dict)

a    1
b    2
c    3
dtype: int64


In [56]:
data = [10,20,30,40,50]
index = ['a','b','c','d','e']
pd.Series(data, index)


a    10
b    20
c    30
d    40
e    50
dtype: int64

In [57]:
## Dataframe
## create a Dataframe from a dictionary oof list

data = {
    'Name':['Pranoy', 'Tony', 'Steve', 'Peter'],
    'Age':[26,28,32,21],
    'City':['Siliguri', 'New York', 'California', 'New York']
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Pranoy,26,Siliguri
1,Tony,28,New York
2,Steve,32,California
3,Peter,21,New York


In [58]:
print(type(df))

<class 'pandas.core.frame.DataFrame'>


In [59]:
np.array(df)

array([['Pranoy', 26, 'Siliguri'],
       ['Tony', 28, 'New York'],
       ['Steve', 32, 'California'],
       ['Peter', 21, 'New York']], dtype=object)

In [60]:
## Create a Data frame From a List of Dictionaries

data=[
    {'Name':'Pranoy','Age':26,'City':'Siliguri'},
    {'Name':'John','Age':34,'City':'Bangalore'},
    {'Name':'Steve','Age':32,'City':'New York'},
    {'Name':'Jack','Age':32,'City':'Dubai'}    
]

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Pranoy,26,Siliguri
1,John,34,Bangalore
2,Steve,32,New York
3,Jack,32,Dubai


In [61]:
sales_df = pd.read_csv('sales_data.csv')

In [62]:
df

Unnamed: 0,Name,Age,City
0,Pranoy,26,Siliguri
1,John,34,Bangalore
2,Steve,32,New York
3,Jack,32,Dubai


In [63]:
sales_df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [64]:
sales_df.tail(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [65]:
### Accessing Data From Dataframe
df

Unnamed: 0,Name,Age,City
0,Pranoy,26,Siliguri
1,John,34,Bangalore
2,Steve,32,New York
3,Jack,32,Dubai


In [70]:
df['Name']

0    Pranoy
1      John
2     Steve
3      Jack
Name: Name, dtype: object

In [71]:
df.loc[0]

Name      Pranoy
Age           26
City    Siliguri
Name: 0, dtype: object

In [72]:
df.iloc[0]

Name      Pranoy
Age           26
City    Siliguri
Name: 0, dtype: object

In [76]:
print(df.index)
print(df.columns)

RangeIndex(start=0, stop=4, step=1)
Index(['Name', 'Age', 'City'], dtype='object')


In [81]:
## Accessing a specified element
df.at[2,'City']

'New York'

In [74]:
## Accessing a specified element using iat
df.iat[2,2]

'New York'

In [91]:
data=[
    {'Name':'Pranoy','Age':26,'City':'Siliguri'},
    {'Name':'John','Age':34,'City':'Bangalore'},
    {'Name':'Steve','Age':32,'City':'New York'},
    {'Name':'Jack','Age':32,'City':'Dubai'}    
]

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Pranoy,26,Siliguri
1,John,34,Bangalore
2,Steve,32,New York
3,Jack,32,Dubai


In [92]:
## Data manipulation with dataframe
### add a new column
df['Salary'] = [50000, 65000, 120000, 38000]
df

Unnamed: 0,Name,Age,City,Salary
0,Pranoy,26,Siliguri,50000
1,John,34,Bangalore,65000
2,Steve,32,New York,120000
3,Jack,32,Dubai,38000


In [93]:
### remove coloum
df.drop('Salary', axis=1, inplace=True)
df

Unnamed: 0,Name,Age,City
0,Pranoy,26,Siliguri
1,John,34,Bangalore
2,Steve,32,New York
3,Jack,32,Dubai


In [95]:
## add age to column
df['Age'] += 1
df

Unnamed: 0,Name,Age,City
0,Pranoy,27,Siliguri
1,John,35,Bangalore
2,Steve,33,New York
3,Jack,33,Dubai


In [97]:
sales_df

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal
...,...,...,...,...,...,...,...,...,...
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.00,270.00,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.00,55.00,Europe,PayPal


In [None]:
# Display the data types of each column
print("Data types:\n", sales_df.dtypes)
print()

# Describe the DataFrame
print("Statistical summary:\n", sales_df.describe())

Data types:
 Transaction ID        int64
Date                 object
Product Category     object
Product Name         object
Units Sold            int64
Unit Price          float64
Total Revenue       float64
Region               object
Payment Method       object
dtype: object

Statistical summary:
        Transaction ID  Units Sold   Unit Price  Total Revenue
count       240.00000  240.000000   240.000000     240.000000
mean      10120.50000    2.158333   236.395583     335.699375
std          69.42622    1.322454   429.446695     485.804469
min       10001.00000    1.000000     6.500000       6.500000
25%       10060.75000    1.000000    29.500000      62.965000
50%       10120.50000    2.000000    89.990000     179.970000
75%       10180.25000    3.000000   249.990000     399.225000
max       10240.00000   10.000000  3899.990000    3899.990000


In [101]:
sales_df.describe()

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue
count,240.0,240.0,240.0,240.0
mean,10120.5,2.158333,236.395583,335.699375
std,69.42622,1.322454,429.446695,485.804469
min,10001.0,1.0,6.5,6.5
25%,10060.75,1.0,29.5,62.965
50%,10120.5,2.0,89.99,179.97
75%,10180.25,3.0,249.99,399.225
max,10240.0,10.0,3899.99,3899.99
