Pandas is a powerful data manipulation library in python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. 

A Series is a one-dimentional array like object while a DataFrame is a two-dimentional, size mutable and potentially heterogeneous(meaning it can have different elements in there) tabular data structure with labeled axes(rows and columns)

In [1]:
import pandas as pd
import numpy as np

In [6]:
# Series
# A pandas Series is a one-dimensional labeled array capable of holding any data type. It is similar to a column in a spreadsheet or a SQL table.

data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print("Series:\n", series)
print("type of series:", type(series))

# creating a series from a dictionary
dict_data = {'a': 100, 'b': 200, 'c': 300}
series_from_dict = pd.Series(dict_data)
print("\nSeries from dictionary:\n", series_from_dict)
print("type of series_from_dict:", type(series_from_dict))

Series:
 0    10
1    20
2    30
3    40
4    50
dtype: int64
type of series: <class 'pandas.core.series.Series'>

Series from dictionary:
 a    100
b    200
c    300
dtype: int64
type of series_from_dict: <class 'pandas.core.series.Series'>


In [7]:
data = [10, 20, 30]
index_of_data = ['a', 'b', 'c']
pd.Series(data, index = index_of_data)

a    10
b    20
c    30
dtype: int64

In [10]:
# DataFrame
# A pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a table in a relational database or an Excel spreadsheet.

# Creating a DataFrame from a dictionary of list
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
} 
pf = pd.DataFrame(data)
print("\nDataFrame:\n",pf)
print("type of DataFrame:", type(pf))


DataFrame:
       Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
type of DataFrame: <class 'pandas.core.frame.DataFrame'>


In [12]:
np.array(data)

array({'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago']},
      dtype=object)

In [2]:
#Create a dataframe from a list of dictionaries
data = [
    {'Name':'Alice','Age': 43, 'City':'Banglore'},
    {'Name':'Bob','Age': 25, 'City':'New York'},
    {'Name':'Charlie','Age': 37, 'City':'San Francisco'}
]

pd.DataFrame(data)
print("\nDataFrame from list of dictionaries:\n", pd.DataFrame(data))


DataFrame from list of dictionaries:
       Name  Age           City
0    Alice   43       Banglore
1      Bob   25       New York
2  Charlie   37  San Francisco


In [45]:
#Read data from a CSV or any external files file into a DataFrame
# df = pd.read_csv('path_to_your_file.csv')
df = pd.read_csv('sales_data.csv')
print("\nDataFrame from CSV file:\n", df.head())  # Display first few rows, we can also give how many rows to show inside head()
df.head(6)


DataFrame from CSV file:
    Transaction ID        Date Product Category             Product Name  \
0           10001  2024-01-01      Electronics            iPhone 14 Pro   
1           10002  2024-01-02  Home Appliances         Dyson V11 Vacuum   
2           10003  2024-01-03         Clothing         Levi's 501 Jeans   
3           10004  2024-01-04            Books        The Da Vinci Code   
4           10005  2024-01-05  Beauty Products  Neutrogena Skincare Set   

   Units Sold  Unit Price  Total Revenue         Region Payment Method  
0           2      999.99        1999.98  North America    Credit Card  
1           1      499.99         499.99         Europe         PayPal  
2           3       69.99         209.97           Asia     Debit Card  
3           4       15.99          63.96  North America    Credit Card  
4           1       89.99          89.99         Europe         PayPal  


Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal
5,10006,2024-01-06,Sports,Wilson Evolution Basketball,5,29.99,149.95,Asia,Credit Card


In [5]:
df.tail() # Display last few rows, we can also give how many rows to show inside tail()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [17]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,43,Banglore
1,Bob,25,New York
2,Charlie,37,San Francisco


In [22]:
#how to access data from dataframe
df = pd.DataFrame(data)
df['Name']  # Accessing a single column; how when we pick a column from dataframe it becomes a series
df[['Name', 'Age']]  # Accessing multiple columns; when we pick multiple columns from dataframe it remains a dataframe

Unnamed: 0,Name,Age
0,Alice,43
1,Bob,25
2,Charlie,37


In [23]:
df.loc[0]   # Accessing a single row by label/index
df.loc[0:1] # Accessing multiple rows by label/index

Unnamed: 0,Name,Age,City
0,Alice,43,Banglore
1,Bob,25,New York


In [25]:
df.iloc[0]  # Accessing a single row by integer location

Name       Alice
Age           43
City    Banglore
Name: 0, dtype: object

In [26]:
df.iloc[0][2]  # Accessing a specific value by row and column indices

  df.iloc[0][2]  # Accessing a specific value by row and column indices


'Banglore'

In [27]:
#Accessing a sepcified element
df.at[0, 'Name']  # Accessing a specific value by label/index

'Alice'

In [28]:
df.at[1, 'Age']  # Accessing another specific value by label/index

25

In [29]:
df.iat[2, 2]    # Accessing a specific value by integer location

'San Francisco'

In [30]:
# Data manipulation with dataframes
df

Unnamed: 0,Name,Age,City
0,Alice,43,Banglore
1,Bob,25,New York
2,Charlie,37,San Francisco


In [31]:
#add a new column
df['Country'] = ['India', 'USA', 'USA']
df['Salary'] = [70000, 80000, 90000]
df

Unnamed: 0,Name,Age,City,Country,Salary
0,Alice,43,Banglore,India,70000
1,Bob,25,New York,USA,80000
2,Charlie,37,San Francisco,USA,90000


In [None]:
#remove a column
df.drop('Salary', axis=1, inplace=True) # axis=1 for column, axis=0 for row; inplace=True to modify the original dataframe and save changes
df

Unnamed: 0,Name,Age,City,Country
0,Alice,43,Banglore,India
1,Bob,25,New York,USA
2,Charlie,37,San Francisco,USA


In [33]:
#add age to column
df['Age'] = df['Age'] + 5
df

Unnamed: 0,Name,Age,City,Country
0,Alice,48,Banglore,India
1,Bob,30,New York,USA
2,Charlie,42,San Francisco,USA


In [None]:
#remove a row
df.drop(1, axis=0, inplace=True) 
df.reset_index(drop=True, inplace=True)  # Reset index after dropping a row

In [42]:
df

Unnamed: 0,Name,Age,City,Country
0,Charlie,42,San Francisco,USA


In [46]:
df

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal
...,...,...,...,...,...,...,...,...,...
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.00,270.00,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.00,55.00,Europe,PayPal


In [47]:
#some more operations with dataframes
df.describe()  # Get summary statistics of numerical columns

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue
count,240.0,240.0,240.0,240.0
mean,10120.5,2.158333,236.395583,335.699375
std,69.42622,1.322454,429.446695,485.804469
min,10001.0,1.0,6.5,6.5
25%,10060.75,1.0,29.5,62.965
50%,10120.5,2.0,89.99,179.97
75%,10180.25,3.0,249.99,399.225
max,10240.0,10.0,3899.99,3899.99
