## Pandas-Dataframe and series
Pandas is a powerful data manipulatin library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogenours tabular data structure with labeled axes (rows and columns).

In [75]:
import pandas as pd

In [5]:
# Series
# A Series is a one-dimensional array-like object that can hold any data type. It has an index, which is used to access the elements.
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print("Series:\n", series)
print(type(series))

Series:
 0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [6]:
# Create a Series from a dictionary
data = {
    'a': 1,
    'b': 2,
    'c': 3,
    'd': 4,
    'e': 5
}
series_dict = pd.Series(data)
print("Series from dictionary:\n", series_dict)

Series from dictionary:
 a    1
b    2
c    3
d    4
e    5
dtype: int64


In [7]:
data = [10, 20, 30]
index = ['a', 'b', 'c']
pd.Series(data, index=index)

a    10
b    20
c    30
dtype: int64

In [54]:
# DataFrame
# A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

# Create a DataFrame from a dictionary of lists

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


In [49]:
import numpy as np
np.array(df)

array([['Alice', 25, 'New York'],
       ['Bob', 30, 'Los Angeles'],
       ['Charlie', 35, 'Chicago']], dtype=object)

In [50]:
# Create a dataframe from a list of dictionaries

data = [
    {'Name': 'Alice', 'Age': 25, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
    {'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'},
    {'Name': 'David', 'Age': 40, 'City': 'Houston'}
]
df_from_list = pd.DataFrame(data)
# Display the DataFrame created from a list of dictionaries
df_from_list

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston


In [17]:
df_from_list['Name']

0      Alice
1        Bob
2    Charlie
3      David
Name: Name, dtype: object

In [18]:
type(df_from_list['Name'])

pandas.core.series.Series

In [23]:
df_from_list.loc[1, 'Age']  # Accessing the first row

np.int64(30)

In [21]:
df_from_list.iloc[0]  # Accessing the first row using integer location

Name       Alice
Age           25
City    New York
Name: 0, dtype: object

In [42]:
sales_df = pd.read_csv('sales_data.csv')  # Example of reading a CSV file into a DataFrame
sales_df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,1001,2023-01-01,Electronics,Smartphone,10,300,3000,North,Credit Card
1,1002,2023-01-02,Home Appliances,Vacuum Cleaner,5,150,750,South,Debit Card
2,1003,2023-01-03,Furniture,Office Chair,8,200,1600,East,Cash
3,1004,2023-01-04,Electronics,Laptop,3,1000,3000,West,Paypal
4,1005,2023-01-05,Home Appliances,Refrigerator,2,1200,2400,North,Credit Card


In [43]:
sales_df.at[2, 'Product Category']  # Accessing a specific cell using label-based indexing

'Furniture'

In [44]:
# Filter by product category
sales_df[sales_df['Product Category'] == 'Electronics']

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,1001,2023-01-01,Electronics,Smartphone,10,300,3000,North,Credit Card
3,1004,2023-01-04,Electronics,Laptop,3,1000,3000,West,Paypal
6,1007,2023-01-07,Electronics,Tablet,6,250,1500,East,Cash
9,1010,2023-01-10,Electronics,Smartwatch,12,200,2400,South,Debit Card
12,1013,2023-01-13,Electronics,Headphones,15,100,1500,North,Credit Card
15,1016,2023-01-16,Electronics,Smart TV,2,1500,3000,West,Paypal
18,1019,2023-01-19,Electronics,Camera,4,600,2400,East,Cash
21,1022,2023-01-22,Electronics,Monitor,5,300,1500,South,Debit Card
24,1025,2023-01-25,Electronics,Wireless Speaker,11,150,1650,North,Credit Card
27,1028,2023-01-28,Electronics,Game Console,3,400,1200,West,Paypal


In [57]:
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


In [61]:
# Accesing a specified element using iat
df.iat[1, 2]  # Accessing the element at row index 2 and column index 1

'Los Angeles'

In [62]:
# Data Manipulation with DataFrames
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


In [68]:
# Remove a column
"""
columns: Name of the column to be removed
axis: 0 for rows, 1 for columns
inplace: If True, the operation is done in place and the original DataFrame is modified
"""
df.drop(columns=['City'], axis=1, inplace=False)

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


In [69]:
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


In [70]:
# Add age to the column
df['Age']+1

0    26
1    31
2    36
Name: Age, dtype: int64

In [71]:
df.head(5)

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


In [72]:
df.dtypes

Name    object
Age      int64
City    object
dtype: object

In [73]:
df.describe()  # Get a summary of the DataFrame

Unnamed: 0,Age
count,3.0
mean,30.0
std,5.0
min,25.0
25%,27.5
50%,30.0
75%,32.5
max,35.0


In [74]:
sales_df.describe()  # Get a summary of the sales DataFrame

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue
count,59.0,59.0,59.0,59.0
mean,1030.0,5.254237,421.355932,1346.779661
std,17.175564,3.143648,457.924318,780.412006
min,1001.0,1.0,20.0,120.0
25%,1015.5,3.0,150.0,775.0
50%,1030.0,5.0,250.0,1250.0
75%,1044.5,7.0,500.0,1700.0
max,1059.0,15.0,2000.0,3200.0
