## Pandas Series:

A Pandas Series is a one-dimensional labeled array-like data structure in which each element can be of a different data type. It's similar to a column in a spreadsheet or a dictionary where you have a unique label (index) associated with each element. The Series can hold data such as integers, strings, floats, etc. It provides powerful indexing and data alignment capabilities.

## Pandas DataFrame:

A Pandas DataFrame is a two-dimensional labeled data structure that consists of columns, each of which can hold different data types. It's similar to a table in a relational database or a spreadsheet, where you have rows and columns. Each column can be thought of as a Pandas Series, and all columns share the same index, allowing for efficient data alignment and manipulation.

# Difference between Series and DataFrame:

The main differences between Pandas Series and DataFrame are:

## Dimensionality:

Series: One-dimensional data structure with an index.

DataFrame: Two-dimensional data structure with both row and column indices.

## Number of Dimensions:

Series: One-dimensional (like a list or array with labels).

DataFrame: Two-dimensional (tabular structure with labeled axes).

## Number of Columns:

Series: Only one column of data.

DataFrame: Multiple columns of data.

## Flexibility:

Series: Limited for holding single-column data.

DataFrame: Flexible for holding multiple columns and heterogeneous data types.

#### Importing Required Modules and Libraries

In [1]:
# !pip install pandas
# !pip install sqlalchemy

import pandas as pd
import random
import sqlalchemy as sa
from sqlalchemy.engine import URL
from sqlalchemy import create_engine

#### Creating Series:

In [2]:
# Creating a Series
series_data = pd.Series([10, 20, 30, 40, 50], index=['A', 'B', 'C', 'D', 'E'])
print(series_data)

A    10
B    20
C    30
D    40
E    50
dtype: int64


#### Creating a DataFrame

In [3]:
# Creating a DataFrame

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22]
}


df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,22


#### witing DataFrame to CSV
1. import the Pandas library using import pandas as pd.

2. Create a sample DataFrame df using a Python dictionary data.

3. Use the df.to_csv() method to write the DataFrame to a CSV file named 'output.csv'. The index=False argument prevents the index column from being included in the CSV file.

In [7]:
import pandas as pd

# Create a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22]
}
df = pd.DataFrame(data)

# Write DataFrame to a CSV file
df.to_csv('data_1.csv', index=False)

#### Reading CSV file
1. Import the Pandas library using import pandas as pd.

2. Use the pd.read_csv() function to read the CSV file named 'data.csv' into a Pandas DataFrame. Replace 'data.csv' with the actual path to your CSV file.

3. Print the first few rows of the DataFrame using the df.head() method. This provides a preview of the loaded data.

In [11]:
import pandas as pd

# Read CSV file into a DataFrame
df = pd.read_csv('data_1.csv')

# Display the first few rows of the DataFrame
df.tail(2)


Unnamed: 0,Name,Age
1,Bob,30
2,Charlie,22


#### Writing DataFrame to Excel:
1. Import the Pandas library using import pandas as pd.

2. Create a sample DataFrame df using a Python dictionary data.

3. Use the df.to_excel() method to write the DataFrame to an Excel file named 'output.xlsx'. The index=False argument prevents the index column from being included in the Excel file.

# Creating Sample Data for Data Frame

In [61]:


# Lists of mock data
first_names = ["Alice", "Bob", "Charlie", "David", "Emma", "Frank", "Grace", "Hannah", "Isaac", "Julia"]
last_names = ["Smith", "Johnson", "Williams", "Brown", "Jones", "Miller", "Davis", "Garcia", "Martinez", "Jackson"]
grades = ["A", "B", "C", "D", "F", None]
courses = ["Math", "Science", "History", "English"]

# Create a list to hold the data
data = []

# Generate random student data
for _ in range(10):
    first_name = random.choice(first_names)
    last_name = random.choice(last_names)
    age = random.randint(18, 25)
    grade = random.choice(grades)
    course = random.choice(courses)
    
    data.append([first_name, last_name, age, grade, course])

# Create a DataFrame from the data
print(data)
columns = ["First Name", "Last Name", "Age", "Grade", "Course"]
df = pd.DataFrame(data, columns=columns)

# Introduce some null values
# null_indices = random.sample(range(50), 10)
# df.loc[null_indices, "Grade"] = None

# Display the DataFrame
df


[['Emma', 'Jackson', 18, 'B', 'History'], ['Hannah', 'Martinez', 18, None, 'Science'], ['David', 'Johnson', 25, 'C', 'Science'], ['David', 'Williams', 25, 'B', 'History'], ['Bob', 'Martinez', 21, 'B', 'English'], ['Grace', 'Jones', 20, 'C', 'Math'], ['Isaac', 'Williams', 24, 'D', 'Science'], ['Emma', 'Brown', 18, 'C', 'Science'], ['Emma', 'Martinez', 23, 'F', 'Science'], ['Bob', 'Davis', 25, None, 'Science']]


Unnamed: 0,First Name,Last Name,Age,Grade,Course
0,Emma,Jackson,18,B,History
1,Hannah,Martinez,18,,Science
2,David,Johnson,25,C,Science
3,David,Williams,25,B,History
4,Bob,Martinez,21,B,English
5,Grace,Jones,20,C,Math
6,Isaac,Williams,24,D,Science
7,Emma,Brown,18,C,Science
8,Emma,Martinez,23,F,Science
9,Bob,Davis,25,,Science


#### DataBase Credentials

In [8]:
postgres_driver="postgresql"
postgres_user="postgres"
postgres_password="hariom"
postgres_host="localhost"
postgres_port="5432"
postgres_db="pandas_practice"

#### Creating connection Url Using creation method

In [9]:
connection_url = sa.engine.URL.create(
    drivername=postgres_driver,
    username=postgres_user,
    password=postgres_password,
    host=postgres_host,
    port=postgres_port,
    database=postgres_db
)
print(connection_url)

postgresql://postgres:***@localhost:5432/pandas_practice


In [10]:
engine = create_engine(connection_url)

#### Writing Data into database table
This code uses the engine connection to replace data in the 'student' table with the contents of DataFrame df, excluding the index. It's a way to write DataFrame data into a database table.

In [11]:
with engine.begin() as conn:
    # Create or obtain your DataFrame `df`
    
    table_name = 'student'  # Replace with your table name
    df.to_sql(table_name, conn, if_exists='replace', index=False)

#### Reading Data From Database table
This code reads all data from the 'student' table in the database using the engine connection and saves it into the DataFrame df1.

In [12]:
with engine.begin() as conn:
    df1 = pd.read_sql_query(sa.text("select * from student;"), conn)
df1

Unnamed: 0,First Name,Last Name,Age,Grade,Course
0,Charlie,Johnson,18,,Science
1,Julia,Jones,21,B,English
2,David,Jones,20,D,Science
3,Isaac,Davis,24,,History
4,David,Smith,25,,History
5,Frank,Johnson,21,F,History
6,Hannah,Brown,21,,English
7,Bob,Brown,18,D,History
8,Alice,Williams,22,,Math
9,David,Williams,23,,Science


#### Creating DataFrame

In [18]:
import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
    'Age': [25, 30, 22, 28, 24],
    'Score': [85, 90, 78, 92, 88],
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Score
0,Alice,25,85
1,Bob,30,90
2,Charlie,22,78
3,David,28,92
4,Emma,24,88


In [None]:
#list
data1 = [
    ["Alice", 25, 85], 
    ['Bob', 30, 90],
    ["Charlie", 22, 78]
]

columns = ["name", "age", "score"]
df = pd.DataFrame(data1, columns=columns)
df

In [21]:
# tuple
data1 = [
    ("Alice", 25, 85), 
    ('Bob', 30, 90),
    ("Charlie", 22, 78)
]

columns = ("name", "age", "score")
df = pd.DataFrame(data1, columns=columns)
df

Unnamed: 0,name,age,score
0,Alice,25,85
1,Bob,30,90
2,Charlie,22,78


# Data Exploration:
#### Now, let's go through the Data Exploration methods one by one:


## df.head(n=5):

Description: Returns the first n rows of the DataFrame. By default, it returns the first 5 rows.

Parameters: n specifies the number of rows to display (optional).

Usage:

In [25]:
df.head(2)    # Default: Displays first 5 rows

Unnamed: 0,name,age,score
0,Alice,25,85
1,Bob,30,90


In [15]:
print(df.head(3))  # Displays first 3 rows

      Name  Age  Score
0    Alice   25     85
1      Bob   30     90
2  Charlie   22     78


## df.tail(n=5):

Description: Returns the last n rows of the DataFrame. By default, it returns the last 5 rows.

Parameters: n specifies the number of rows to display (optional).

Usage:

In [26]:
print(df.tail())    # Default: Displays last 5 rows

      name  age  score
0    Alice   25     85
1      Bob   30     90
2  Charlie   22     78


In [17]:
print(df.tail(2))   # Displays last 2 rows

    Name  Age  Score
3  David   28     92
4   Emma   24     88


## df.info():

Description: Provides a concise summary of the DataFrame, including data types, non-null values, and memory usage.

Parameters: None.

Usage:

In [28]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    3 non-null      object
 1   age     3 non-null      int64 
 2   score   3 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 204.0+ bytes


# df.describe():

Description: Generates descriptive statistics of the numeric columns in the DataFrame, including mean, standard deviation, min, max, quartiles, etc.

Parameters: None.

Usage:

In [30]:
print(df)
print(df.describe())

      name  age  score
0    Alice   25     85
1      Bob   30     90
2  Charlie   22     78
             age      score
count   3.000000   3.000000
mean   25.666667  84.333333
std     4.041452   6.027714
min    22.000000  78.000000
25%    23.500000  81.500000
50%    25.000000  85.000000
75%    27.500000  87.500000
max    30.000000  90.000000


## df.shape:

Description: Returns a tuple representing the dimensions of the DataFrame (number of rows, number of columns).

Parameters: None.

Usage:

In [31]:
print(df.shape)    # Output: (3, 3) indicating 5 rows and 3 columns

(3, 3)


## df.columns:

Description: Returns the column labels of the DataFrame.
Parameters: None.
Usage:

In [32]:
print(df.columns)  # Output: Index(['Name', 'Age', 'Score'], dtype='object')

Index(['name', 'age', 'score'], dtype='object')


# Data Selection and Indexing:

## Data Accessing Techniques

In [42]:
df['age']


print(type(df['age']))

<class 'pandas.core.series.Series'>


In [43]:
df[['age']]


print(type(df[['age']]))

<class 'pandas.core.frame.DataFrame'>


In [44]:
df[['age', 'score']]

Unnamed: 0,age,score
0,25,85
1,30,90
2,22,78


In [48]:
df[['age', 'score']][1:3]  # l = [1,2,3,4,5] l[1:3]

Unnamed: 0,age,score
1,30,90
2,22,78


In [53]:
df[df.index == 1]

Unnamed: 0,name,age,score
1,Bob,30,90


In [54]:
df[df.index == "Alice"]

Unnamed: 0,name,age,score


## df.loc[] - Label-based Indexing:

## subset = df.loc[row_labels, column_labels]

Explanation: This method allows you to access rows and columns by using labels (index and column names).

Usage:

In [57]:
sliced_data = df[['age', 'score']][1:3]
sliced_data

Unnamed: 0,age,score
1,30,90
2,22,78


In [63]:
data = [
    ['Emma', 'Jackson', 18, 'B', 'History'], 
    ['Hannah', 'Martinez', 18, None, 'Science'], 
    ['David', 'Johnson', 25, 'C', 'Science'], 
    ['David', 'Williams', 25, 'B', 'History'], 
    ['Bob', 'Martinez', 21, 'B', 'English'], 
    ['Grace', 'Jones', 20, 'C', 'Math'], 
    ['Isaac', 'Williams', 24, 'D', 'Science'], 
    ['Emma', 'Brown', 18, 'C', 'Science'], 
    ['Emma', 'Martinez', 23, 'F', 'Science'], 
    ['Bob', 'Davis', 25, None, 'Science']
]
columns = ["First_Name", "Last_Name", "Age", "Grade", "Course"]
df = pd.DataFrame(data, columns=columns)
df

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
0,Emma,Jackson,18,B,History
1,Hannah,Martinez,18,,Science
2,David,Johnson,25,C,Science
3,David,Williams,25,B,History
4,Bob,Martinez,21,B,English
5,Grace,Jones,20,C,Math
6,Isaac,Williams,24,D,Science
7,Emma,Brown,18,C,Science
8,Emma,Martinez,23,F,Science
9,Bob,Davis,25,,Science


In [65]:
sliced_data = df[['Age', 'Course']][1:3]
sliced_data

Unnamed: 0,Age,Course
1,18,Science
2,25,Science


In [72]:
l1 = [1, 2, 3, 4]
l2 = l1
print(id(l1), id(l2))


l2.append(5)
print(l2)
print(l1)



2182449163328 2182449163328
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]


In [77]:
df.head()

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
0,Emma,Jackson,18,B,History
1,Hannah,Martinez,18,,Science
2,David,Johnson,25,C,Science
3,David,Williams,25,B,History
4,Bob,Martinez,21,B,English


In [78]:
df.loc[3]

First_Name       David
Last_Name     Williams
Age                 25
Grade                B
Course         History
Name: 3, dtype: object

In [79]:
df.loc[[3]]

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
3,David,Williams,25,B,History


In [83]:
df.loc[2:4,"First_Name":"Age"]

Unnamed: 0,First_Name,Last_Name,Age
2,David,Johnson,25
3,David,Williams,25
4,Bob,Martinez,21


In [85]:
df.loc[1:3, 'First_Name':'Course']  # Select rows 1 to 3, columns 'Name' to 'Age'

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
1,Hannah,Martinez,18,,Science
2,David,Johnson,25,C,Science
3,David,Williams,25,B,History


In [96]:
# df.loc[1:2,]
df.loc[1:2]

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
1,Hannah,Martinez,18,,Science
2,David,Johnson,25,C,Science


In [97]:
# df.loc[:, 'Age':'Course']
df.loc[:, 'Age':'Course']

Unnamed: 0,Age,Grade,Course
0,18,B,History
1,18,,Science
2,25,C,Science
3,25,B,History
4,21,B,English
5,20,C,Math
6,24,D,Science
7,18,C,Science
8,23,F,Science
9,25,,Science


In [110]:
df.loc[2:3,:"Age"]

Unnamed: 0,First_Name,Last_Name,Age
2,David,Johnson,25
3,David,Williams,25


In [113]:
df.loc[[1, 3], ["First_Name", "Age", "Course"]]

Unnamed: 0,First_Name,Age,Course
1,Hannah,18,Science
3,David,25,History


## df.iloc[] - Integer-based Indexing:

## subset = df.iloc[row_positions, column_positions]

Explanation: This method allows you to access rows and columns by using integer positions.

Usage:

In [114]:
data = [
    ['Emma', 'Jackson', 18, 'B', 'History'], 
    ['Hannah', 'Martinez', 18, None, 'Science'], 
    ['David', 'Johnson', 25, 'C', 'Science'], 
    ['David', 'Williams', 25, 'B', 'History'], 
    ['Bob', 'Martinez', 21, 'B', 'English'], 
    ['Grace', 'Jones', 20, 'C', 'Math'], 
    ['Isaac', 'Williams', 24, 'D', 'Science'], 
    ['Emma', 'Brown', 18, 'C', 'Science'], 
    ['Emma', 'Martinez', 23, 'F', 'Science'], 
    ['Bob', 'Davis', 25, None, 'Science']
]
columns = ["First_Name", "Last_Name", "Age", "Grade", "Course"]
df = pd.DataFrame(data, columns=columns)
df

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
0,Emma,Jackson,18,B,History
1,Hannah,Martinez,18,,Science
2,David,Johnson,25,C,Science
3,David,Williams,25,B,History
4,Bob,Martinez,21,B,English
5,Grace,Jones,20,C,Math
6,Isaac,Williams,24,D,Science
7,Emma,Brown,18,C,Science
8,Emma,Martinez,23,F,Science
9,Bob,Davis,25,,Science


In [121]:
# df.iloc[3]
# df.iloc[[3]]
df1 = df.iloc[2:5, 1:3]
df1

Unnamed: 0,Last_Name,Age
2,Johnson,25
3,Williams,25
4,Martinez,21


In [124]:
df.index = ['A', 1, 'B', 3, 'C', 'D', 4, 'E', 5, "F"]
# df
# df.loc[1:'C', ['First_Name', 'Age']]
df.iloc[1:5, [1, 3]]

Unnamed: 0,Last_Name,Grade
1,Martinez,
B,Johnson,C
3,Williams,B
C,Martinez,B


In [125]:
df.index = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# df.loc[5:6,]
df.iloc[5:6,]

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
5,Grace,Jones,20,C,Math


In [126]:
df.iloc[0:4, 0:3]

Unnamed: 0,First_Name,Last_Name,Age
0,Emma,Jackson,18
1,Hannah,Martinez,18
2,David,Johnson,25
3,David,Williams,25


In [127]:
df.iloc[1:3, :]

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
1,Hannah,Martinez,18,,Science
2,David,Johnson,25,C,Science


In [128]:
df.iloc[1:2]
# or
df.iloc[1:2, :]

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
1,Hannah,Martinez,18,,Science


In [129]:
df.iloc[:, 1:3]

Unnamed: 0,Last_Name,Age
0,Jackson,18
1,Martinez,18
2,Johnson,25
3,Williams,25
4,Martinez,21
5,Jones,20
6,Williams,24
7,Brown,18
8,Martinez,23
9,Davis,25


In [130]:
df.iloc[:, :]

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
0,Emma,Jackson,18,B,History
1,Hannah,Martinez,18,,Science
2,David,Johnson,25,C,Science
3,David,Williams,25,B,History
4,Bob,Martinez,21,B,English
5,Grace,Jones,20,C,Math
6,Isaac,Williams,24,D,Science
7,Emma,Brown,18,C,Science
8,Emma,Martinez,23,F,Science
9,Bob,Davis,25,,Science


In [131]:
df.iloc[[2, 1, 5], [3, 2]]

Unnamed: 0,Grade,Age
2,C,25
1,,18
5,C,20


## df.at[] - Scalar Access by Label:

## value = df.at[row_label, column_label]

Explanation: This method provides fast access to a single scalar value using labels.

Usage:

In [132]:
df.at[2, 'Age']  # Gets the age value at row with label 2 and column 'Age'

25

In [133]:
df.index = ['A', 1, 'B', 3, 'C', 'D', 4, 'E', 5, "F"]
df.at['A', 'First_Name']

'Emma'

In [134]:
df

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
A,Emma,Jackson,18,B,History
1,Hannah,Martinez,18,,Science
B,David,Johnson,25,C,Science
3,David,Williams,25,B,History
C,Bob,Martinez,21,B,English
D,Grace,Jones,20,C,Math
4,Isaac,Williams,24,D,Science
E,Emma,Brown,18,C,Science
5,Emma,Martinez,23,F,Science
F,Bob,Davis,25,,Science


## .iat[] Method:

## value = df.iat[row_position, column_position]


The .iat[] indexer provides fast access to a single scalar value in a DataFrame or Series using integer-based indexing. It's used when you want to access a specific value at the intersection of a particular row and column.

In [135]:
df.iat[5, 1]  # Output: 22 (3rd row, 2nd column)

'Jones'

In [136]:
df.iat[3, 2] = 45  # 4th row, 3rd column

In [137]:
df

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
A,Emma,Jackson,18,B,History
1,Hannah,Martinez,18,,Science
B,David,Johnson,25,C,Science
3,David,Williams,45,B,History
C,Bob,Martinez,21,B,English
D,Grace,Jones,20,C,Math
4,Isaac,Williams,24,D,Science
E,Emma,Brown,18,C,Science
5,Emma,Martinez,23,F,Science
F,Bob,Davis,25,,Science


## .isin() Method:

## bool_mask = df.isin(values)

The .isin() method creates a Boolean mask indicating whether each element is in a given list of values. It's commonly used for conditional filtering and selection based on multiple values.

In [138]:
df

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
A,Emma,Jackson,18,B,History
1,Hannah,Martinez,18,,Science
B,David,Johnson,25,C,Science
3,David,Williams,45,B,History
C,Bob,Martinez,21,B,English
D,Grace,Jones,20,C,Math
4,Isaac,Williams,24,D,Science
E,Emma,Brown,18,C,Science
5,Emma,Martinez,23,F,Science
F,Bob,Davis,25,,Science


In [139]:
df['Age'].isin([25, 28])

A    False
1    False
B     True
3    False
C    False
D    False
4    False
E    False
5    False
F     True
Name: Age, dtype: bool

In [140]:
df[df['Age'].isin([25, 28])]

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
B,David,Johnson,25,C,Science
F,Bob,Davis,25,,Science


In [141]:
df.isin([25, 'David', 90])

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
A,False,False,False,False,False
1,False,False,False,False,False
B,True,False,True,False,False
3,True,False,False,False,False
C,False,False,False,False,False
D,False,False,False,False,False
4,False,False,False,False,False
E,False,False,False,False,False
5,False,False,False,False,False
F,False,False,True,False,False


In [142]:
df

Unnamed: 0,First_Name,Last_Name,Age,Grade,Course
A,Emma,Jackson,18,B,History
1,Hannah,Martinez,18,,Science
B,David,Johnson,25,C,Science
3,David,Williams,45,B,History
C,Bob,Martinez,21,B,English
D,Grace,Jones,20,C,Math
4,Isaac,Williams,24,D,Science
E,Emma,Brown,18,C,Science
5,Emma,Martinez,23,F,Science
F,Bob,Davis,25,,Science
