
## Pandas-DataFrame And Series

Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).


In [2]:
import pandas as pd

In [19]:
## Series
## A Series is a one-dimensional array-like object containing a sequence of values (of similar types to NumPy types) and an associated array of data labels, called its index.
## A Pandas Series ia a one-dimensional array like object that can hold any data types. It is similar to a Python list or a NumPy array.

import pandas as pd
data = [1,2,3,4,5]
series = pd.Series(data)
print(type(series))
print(series)

<class 'pandas.core.series.Series'>
0    1
1    2
2    3
3    4
4    5
dtype: int64


In [5]:
## create a series from dictionary
data  = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
series = pd.Series(data)
print(series)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [8]:
data = [10,20,30]
series = pd.Series(data,index = ['a','b','c'])
print(series)


a    10
b    20
c    30
dtype: int64


In [12]:
## Dataframe
## create a DataFrame from a dictionary of list

data = {
    'Name':['Bibash',"Bibek","Dipesh","Sudip"],
    'Age':[23,24,25,26],
    'Address':['Kathmandu','Lalitpur','Bhaktapur','Pokhara'],
    
}

df = pd.DataFrame(data,index=['1','2','3','4'] )
print(df)

     Name  Age    Address
1  Bibash   23  Kathmandu
2   Bibek   24   Lalitpur
3  Dipesh   25  Bhaktapur
4   Sudip   26    Pokhara


In [13]:
import numpy as np
np.array(df)

array([['Bibash', 23, 'Kathmandu'],
       ['Bibek', 24, 'Lalitpur'],
       ['Dipesh', 25, 'Bhaktapur'],
       ['Sudip', 26, 'Pokhara']], dtype=object)

In [21]:
## create a DataFrame from a list of dictionary

data = [
    {'Name': 'John', 'Age': 28, 'Country': 'USA'},
    {'Name': 'Anna', 'Age': 24, 'Country': 'UK'},
    {'Name': 'Tom', 'Age': 22, 'Country': 'USA'},
]
df = pd.DataFrame(data, index=['1', '2','3'])
print(df)

   Name  Age Country
1  John   28     USA
2  Anna   24      UK
3   Tom   22     USA


In [121]:
df = pd.read_csv('sales_data.csv')
df.head(5)


Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [122]:
df.tail(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [123]:
df.describe()

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue
count,240.0,240.0,240.0,240.0
mean,10120.5,2.158333,236.395583,335.699375
std,69.42622,1.322454,429.446695,485.804469
min,10001.0,1.0,6.5,6.5
25%,10060.75,1.0,29.5,62.965
50%,10120.5,2.0,89.99,179.97
75%,10180.25,3.0,249.99,399.225
max,10240.0,10.0,3899.99,3899.99


In [37]:
## Accessing data from data frame
data = [
    {'Name': 'John', 'Age': 28, 'Country': 'USA'},
    {'Name': 'Anna', 'Age': 24, 'Country': 'UK'},
    {'Name': 'Tom', 'Age': 22, 'Country': 'USA'},
]
df = pd.DataFrame(data)
print(df)



   Name  Age Country
0  John   28     USA
1  Anna   24      UK
2   Tom   22     USA


In [36]:
print(df['Name'])
print(type(df['Name']))

0    John
1    Anna
2     Tom
Name: Name, dtype: object
<class 'pandas.core.series.Series'>


In [40]:
df.loc[0]

Name       John
Age          28
Country     USA
Name: 0, dtype: object

In [41]:
df.iloc[0]

Name       John
Age          28
Country     USA
Name: 0, dtype: object

In [62]:

df.loc[0]['Name']

'John'

In [60]:
## Accessing a specific element
df.at[2,'Name']

'Tom'

In [61]:
## Accessing a specified element using iat
df.iat[2,2] # 2nd row and 2nd column

'USA'

In [63]:
## Data manipulation with dataframe

df

Unnamed: 0,Name,Age,Country
0,John,28,USA
1,Anna,24,UK
2,Tom,22,USA


In [64]:
df['Salary'] = [2000,3000,4000]
print(df)

   Name  Age Country  Salary
0  John   28     USA    2000
1  Anna   24      UK    3000
2   Tom   22     USA    4000


In [None]:
## remove a column
df.drop('Salary', axis=1) ## it is temporary operations

Unnamed: 0,Name,Age,Country
0,John,28,USA
1,Anna,24,UK
2,Tom,22,USA


In [69]:
df

Unnamed: 0,Name,Age,Country,Salary
0,John,28,USA,2000
1,Anna,24,UK,3000
2,Tom,22,USA,4000


In [70]:
df.drop('Salary', axis=1, inplace=True) ## it is permanent operations

In [71]:
df

Unnamed: 0,Name,Age,Country
0,John,28,USA
1,Anna,24,UK
2,Tom,22,USA


In [80]:
## Add age to the column
df["Age"]= df["Age"]+1

print(df)

   Name  Age Country
0  John   34     USA
1  Anna   30      UK
2   Tom   28     USA


In [87]:
df.drop(0,inplace=True)
print(df)

   Name  Age Country
1  Anna   30      UK
2   Tom   28     USA


In [116]:
df = pd.DataFrame({'Name': ['Sita', 'Rita'],
                   'Age': [24, 400],
                   'Country': ['Nepal', 'Nepal']})

print(f"original dataframe: \n {df} \n")

data = {'Name': 'Ram', 'Age': 28, 'Country': 'Nepal'}

# Convert dictionary to DataFrame and concatenate
df = pd.concat([df, pd.DataFrame([data])], ignore_index=True)
print(f"updated dataframe: \n{df} \n")

original dataframe: 
    Name  Age Country
0  Sita   24   Nepal
1  Rita  400   Nepal 

updated dataframe: 
   Name  Age Country
0  Sita   24   Nepal
1  Rita  400   Nepal
2   Ram   28   Nepal 



In [83]:
## Filtering data
## filter the data where age is greater than 30
df[df['Age']>30]

## filter the data where name is Tom
df[df['Name']=="Tom"]



Unnamed: 0,Name,Age,Country
2,Tom,28,USA


Unnamed: 0,Age
count,3.0
mean,150.666667
std,215.938263
min,24.0
25%,26.0
50%,28.0
75%,214.0
max,400.0
