### Pandas-DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [1]:
import pandas as pd

In [3]:
## Series
##A Pandas Series is a one-dimensional array-like object that can hold any data type. 
# It is similar to a column in a table.

import pandas as pd

data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print("series: \n", series)
print(type(series))  # Output: <class 'pandas.core.series.Series'>



series: 
 0    10
1    20
2    30
3    40
4    50
dtype: int64
<class 'pandas.core.series.Series'>


In [4]:
## Create a Series from a dictionary

data_dict = {'a': 10, 'b': 20, 'c': 30}
series_from_dict = pd.Series(data_dict)
print("\nSeries from dictionary:\n", series_from_dict)


Series from dictionary:
 a    10
b    20
c    30
dtype: int64


In [5]:
data = [10,20,30]
index = ['a','b','c']
pd.Series(data,index=index)

a    10
b    20
c    30
dtype: int64

In [6]:
# DataFrame
# A Pandas DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). 
# It is similar to a spreadsheet or SQL table.

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df=pd.DataFrame(data)
print(df)
print(type(df))

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
<class 'pandas.core.frame.DataFrame'>


In [7]:
# Create a DataFrame from a list of dictionaries

data_list = [
    {'Name': 'David', 'Age': 28, 'City': 'Miami'},
    {'Name': 'Eve', 'Age': 32, 'City': 'Seattle'},
    {'Name': 'Frank', 'Age': 45, 'City': 'Boston'}
]

df_list = pd.DataFrame(data_list)
print(df_list)

    Name  Age     City
0  David   28    Miami
1    Eve   32  Seattle
2  Frank   45   Boston


In [8]:
df=pd.read_csv('sales_data.csv')
df.head() # Display the first few rows of the DataFrame

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [9]:
df.tail() # Display the last few rows of the DataFrame

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [11]:
df.info()  # Get information about the DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Transaction ID    240 non-null    int64  
 1   Date              240 non-null    object 
 2   Product Category  240 non-null    object 
 3   Product Name      240 non-null    object 
 4   Units Sold        240 non-null    int64  
 5   Unit Price        240 non-null    float64
 6   Total Revenue     240 non-null    float64
 7   Region            240 non-null    object 
 8   Payment Method    240 non-null    object 
dtypes: float64(2), int64(2), object(5)
memory usage: 17.0+ KB


In [14]:
df_list['Name']  # Accessing a single column

0    David
1      Eve
2    Frank
Name: Name, dtype: object

In [16]:
df_list.loc[0]  # Accessing a single row by label

Name    David
Age        28
City    Miami
Name: 0, dtype: object

In [18]:
df_list.iloc[0]  # Accessing a single row by integer location

Name    David
Age        28
City    Miami
Name: 0, dtype: object

In [19]:
df_list

Unnamed: 0,Name,Age,City
0,David,28,Miami
1,Eve,32,Seattle
2,Frank,45,Boston


In [20]:
## Accessing a specific element
df_list.at[2,'Age']  # Accessing element at row index 2 and column 'Age'

np.int64(45)

In [21]:
df_list.at[2,'Name']

'Frank'

In [23]:
## Accessing a specified element using iat

df_list.iat[2,2]  # Accessing element at row index 2 and column index 1

'Boston'

In [24]:
# Data Manipulation with DataFrames

df_list

Unnamed: 0,Name,Age,City
0,David,28,Miami
1,Eve,32,Seattle
2,Frank,45,Boston


In [25]:
## Adding a column
df_list['Salary'] = [50000, 60000, 70000]
df_list

Unnamed: 0,Name,Age,City,Salary
0,David,28,Miami,50000
1,Eve,32,Seattle,60000
2,Frank,45,Boston,70000


In [None]:
# remove a column
df_list.drop('Salary',axis=1,inplace=True) # axis=1 for column, axis=0 for row and inplace=True to modify the original DataFrame

In [28]:
df_list

Unnamed: 0,Name,Age,City
0,David,28,Miami
1,Eve,32,Seattle
2,Frank,45,Boston


In [30]:
## add age to the column
df_list['Age'] = df_list['Age'] + 5
df_list

Unnamed: 0,Name,Age,City
0,David,38,Miami
1,Eve,42,Seattle
2,Frank,55,Boston


In [31]:
df_list.drop(0,inplace=True)  # Drop row with index 0

In [32]:
df_list

Unnamed: 0,Name,Age,City
1,Eve,42,Seattle
2,Frank,55,Boston


In [33]:
df_1 = pd.read_csv('sales_data.csv')
df_1.head()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [34]:
# Display the data types of each column
print("Data types:\n", df_1.dtypes )

# Describe the DataFrame to get summary statistics

print("\nSummary statistics:\n", df_1.describe() )

Data types:
 Transaction ID        int64
Date                 object
Product Category     object
Product Name         object
Units Sold            int64
Unit Price          float64
Total Revenue       float64
Region               object
Payment Method       object
dtype: object

Summary statistics:
        Transaction ID  Units Sold   Unit Price  Total Revenue
count       240.00000  240.000000   240.000000     240.000000
mean      10120.50000    2.158333   236.395583     335.699375
std          69.42622    1.322454   429.446695     485.804469
min       10001.00000    1.000000     6.500000       6.500000
25%       10060.75000    1.000000    29.500000      62.965000
50%       10120.50000    2.000000    89.990000     179.970000
75%       10180.25000    3.000000   249.990000     399.225000
max       10240.00000   10.000000  3899.990000    3899.990000


In [35]:
df.describe()

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue
count,240.0,240.0,240.0,240.0
mean,10120.5,2.158333,236.395583,335.699375
std,69.42622,1.322454,429.446695,485.804469
min,10001.0,1.0,6.5,6.5
25%,10060.75,1.0,29.5,62.965
50%,10120.5,2.0,89.99,179.97
75%,10180.25,3.0,249.99,399.225
max,10240.0,10.0,3899.99,3899.99
