#### Pandas-DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [3]:
import pandas as pd

In [3]:
data = [1,2,3,4,5]
series = pd.Series(data)
print("Series : \n", series)
print(type(series))

Series : 
 0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [4]:
## create a series from dictionary

data = {
    'a':1,
    'b':2,
    'c':3
}
series_dict = pd.Series(data)
print(series_dict)

a    1
b    2
c    3
dtype: int64


In [5]:
data = [10, 20, 30]
index = ['a', 'b', 'c']
pd.Series(data, index = index)

a    10
b    20
c    30
dtype: int64

In [None]:
## Dataframe
## create  a dataframe uding dictionary

data = {
    'Name':['Om','Dolly', 'Shubh'],
    'Age' :[20, 19, 21],
    'City' : ['Ghaziabad', 'RDC', 'Delhi']
}

df = pd.DataFrame(data)
print(df)
print(type(df))


    Name  Age       City
0     Om   20  Ghaziabad
1  Dolly   19        RDC
2  Shubh   21      Delhi
<class 'pandas.core.frame.DataFrame'>


In [10]:
import numpy as np
np.array(df)

array([['Om', 20, 'Ghaziabad'],
       ['Dolly', 19, 'RDC'],
       ['Shubh', 21, 'Delhi']], dtype=object)

In [18]:
## Create a df from a list of dictionary

data = [
    {'Name':'Om','Age':20 ,'City':'Ghaziabad'},
    {'Name':'Dolly','Age':19 ,'City':'RDC'},
    {'Name':'Shubh','Age':21 ,'City':'Delhi'}
]

df = pd.DataFrame(data)
print(df)
print(type(df))


    Name  Age       City
0     Om   20  Ghaziabad
1  Dolly   19        RDC
2  Shubh   21      Delhi
<class 'pandas.core.frame.DataFrame'>


In [13]:
df = pd.read_csv('sales_data.csv')
df.head()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [14]:
df.tail()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [19]:
### Accessing data from dataframe

df

Unnamed: 0,Name,Age,City
0,Om,20,Ghaziabad
1,Dolly,19,RDC
2,Shubh,21,Delhi


In [20]:
df['Name']

0       Om
1    Dolly
2    Shubh
Name: Name, dtype: object

In [21]:
df.loc[0]

Name           Om
Age            20
City    Ghaziabad
Name: 0, dtype: object

In [22]:
df.iloc[0]

Name           Om
Age            20
City    Ghaziabad
Name: 0, dtype: object

In [25]:
### Accessing a specified element

print(df.at[1,'Age'])

df.at[1,'Name']

19


'Dolly'

In [26]:
## Data Manipulatuion with data frame

df['Salary'] = [50000, 60000, 70000]

df

Unnamed: 0,Name,Age,City,Salary
0,Om,20,Ghaziabad,50000
1,Dolly,19,RDC,60000
2,Shubh,21,Delhi,70000


In [27]:
## Remove a column

df.drop('Salary', axis= 1)

Unnamed: 0,Name,Age,City
0,Om,20,Ghaziabad
1,Dolly,19,RDC
2,Shubh,21,Delhi


In [28]:
df

Unnamed: 0,Name,Age,City,Salary
0,Om,20,Ghaziabad,50000
1,Dolly,19,RDC,60000
2,Shubh,21,Delhi,70000


In [29]:
df.drop('Salary', axis= 1, inplace=True)

In [30]:
df

Unnamed: 0,Name,Age,City
0,Om,20,Ghaziabad
1,Dolly,19,RDC
2,Shubh,21,Delhi


In [33]:
## Add age to the column

df['Age'] = df['Age']+1
df

Unnamed: 0,Name,Age,City
0,Om,22,Ghaziabad
1,Dolly,21,RDC
2,Shubh,23,Delhi


In [34]:
df.drop(0)

Unnamed: 0,Name,Age,City
1,Dolly,21,RDC
2,Shubh,23,Delhi


In [4]:
df = pd.read_csv('sales_data.csv')
df.head()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [6]:
## Display datatype of each column

print("DataTypes:\n", df.dtypes)


## Describe the dataframe

print("Stats Summary:\n", df.describe())


# # Group by a column and perform aggregation

# grouped = df.groupby('Category')['Value'].mean()
# print("Mean Value by category:\n", grouped)


DataTypes:
 Transaction ID        int64
Date                 object
Product Category     object
Product Name         object
Units Sold            int64
Unit Price          float64
Total Revenue       float64
Region               object
Payment Method       object
dtype: object
Stats Summary:
        Transaction ID  Units Sold   Unit Price  Total Revenue
count       240.00000  240.000000   240.000000     240.000000
mean      10120.50000    2.158333   236.395583     335.699375
std          69.42622    1.322454   429.446695     485.804469
min       10001.00000    1.000000     6.500000       6.500000
25%       10060.75000    1.000000    29.500000      62.965000
50%       10120.50000    2.000000    89.990000     179.970000
75%       10180.25000    3.000000   249.990000     399.225000
max       10240.00000   10.000000  3899.990000    3899.990000
