In [1]:
import pandas as pd
import matplotlib.pyplot as plt

Pandas is an open-source data manipulation and analysis library for Python. It provides data structures for efficiently handling large datasets and tools for working with structured data. Here's an overview of key aspects of the Pandas library:

1. Data Structures:

Series: A one-dimensional array capable of holding any data type. It's similar to a column in a spreadsheet or a single variable in statistics.

DataFrame: A two-dimensional table with rows and columns. It can be thought of as a spreadsheet or SQL table, where each column can be a different data type.

2. Data Loading and Saving:

Pandas provides functions to read data from various file formats such as CSV, Excel, SQL databases, JSON, and more.

It also allows you to write data back to these formats.

3. Data Cleaning and Preparation:

Pandas offers powerful tools for cleaning and preparing data, including handling missing values, filtering, sorting, and merging datasets.

Methods like dropna(), fillna(), and duplicated() are commonly used for data cleaning.

4. Indexing and Selecting Data:

Pandas uses labels for indexing and selecting data. The loc and iloc attributes are commonly used for label-based and integer-based indexing, respectively.

Conditional indexing is a powerful feature for selecting specific subsets of data.

5. Operations and Transformations:

Pandas supports element-wise operations between series and dataframes, similar to NumPy.

It provides various statistical and mathematical operations, along with methods like groupby for aggregating data.

6. Visualization:

While Pandas itself is not a visualization library, it integrates well with Matplotlib and Seaborn for creating visualizations.

DataFrames have built-in plotting methods for quick exploratory data visualization.

7. Integration with Other Libraries:

Pandas is often used in conjunction with other libraries like NumPy, Matplotlib, and Scikit-Learn to form a powerful data analysis and machine learning toolkit.

8. Use Cases:

Pandas is widely used for data cleaning, exploration, and analysis in fields such as finance, economics, social sciences, and more.

It is a foundational tool for data scientists, analysts, and researchers working with structured data.

Pandas is a versatile library that plays a crucial role in the Python ecosystem for data analysis. Its flexibility and ease of use make it a popular choice for handling and analyzing tabular data.

Series:

A Series is a one-dimensional array-like object in Pandas.

It is similar to a column in a spreadsheet or a single variable in statistics.

Each element in a Series has an associated index, which can be explicitly set or is automatically generated.

You can create a Series from a Python list, NumPy array, or dictionary.

### 1. Series 
A Pandas  Series  is a one-dimensional array of indexed data. 
#### Syntax : pd.Series(data, index=index)

In [2]:
import pandas as pd
import numpy as np

# Creating a Series from a list
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)
#In this example, the index is automatically generated, and the data consists of integers and a NaN (representing a missing value).

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64


In [3]:
s.describe()

count    5.000000
mean     4.600000
std      2.701851
min      1.000000
25%      3.000000
50%      5.000000
75%      6.000000
max      8.000000
dtype: float64

In [4]:
s.info()

<class 'pandas.core.series.Series'>
RangeIndex: 6 entries, 0 to 5
Series name: None
Non-Null Count  Dtype  
--------------  -----  
5 non-null      float64
dtypes: float64(1)
memory usage: 180.0 bytes


DataFrame:

A DataFrame is a two-dimensional table with rows and columns, similar to a spreadsheet or a SQL table.

It is the primary data structure in Pandas and is used for most data manipulation and analysis tasks.

Each column in a DataFrame is a Series.

DataFrame allows you to handle heterogeneous data types and supports a wide range of operations.

### 2. Dataframes

An effective object/data structure offered by PANDAS that allows us to handle the tabular form of data(which is basically what we get to work with on a day to day basis)

#### Syntax : pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)

In [5]:
# Creating a DataFrame from a dictionary
# Sample data for the DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 28, 24],
    'City': ['New York', 'London', 'Tokyo', 'Paris', 'Berlin'],
    'Salary': [60000, 75000, 80000, 62000, 57000],
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
    'Category': ['A', 'B', 'C', 'A', 'B']
}
df = pd.DataFrame(data)
df.to_csv('data.csv')
# df.to_excel('data.xlsx', index=False, sheet_name='Sheet1')
# df.to_json('data.json', orient='records')
# engine = create_engine('sqlite:///data.db')   >df.to_sql('my_table', engine, index=False, if_exists='replace')
df

Unnamed: 0,Name,Age,City,Salary,Date,Category
0,Alice,25,New York,60000,2023-01-01,A
1,Bob,30,London,75000,2023-01-02,B
2,Charlie,35,Tokyo,80000,2023-01-03,C
3,David,28,Paris,62000,2023-01-04,A
4,Eva,24,Berlin,57000,2023-01-05,B


In [6]:
df = pd.read_csv(r"D:\Python\data.csv")
df

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,Category
0,0,Alice,25,New York,60000,2023-01-01,A
1,1,Bob,30,London,75000,2023-01-02,B
2,2,Charlie,35,Tokyo,80000,2023-01-03,C
3,3,David,28,Paris,62000,2023-01-04,A
4,4,Eva,24,Berlin,57000,2023-01-05,B


In [7]:
# Reading data from a CSV file
df = pd.read_csv('data.csv')
#pd.read_excel()
# pd.read_json('data.json')
# sqlite3.connect('database.db')

# Exploring Data

In [8]:
# Display the first few rows
#df.head(2)
# Display the last few rows
df.tail(2)

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,Category
3,3,David,28,Paris,62000,2023-01-04,A
4,4,Eva,24,Berlin,57000,2023-01-05,B


In [9]:
# Basic summary statistics
df.describe()

Unnamed: 0.1,Unnamed: 0,Age,Salary
count,5.0,5.0,5.0
mean,2.0,28.4,66800.0
std,1.581139,4.393177,10084.641788
min,0.0,24.0,57000.0
25%,1.0,25.0,60000.0
50%,2.0,28.0,62000.0
75%,3.0,30.0,75000.0
max,4.0,35.0,80000.0


In [10]:
# Information about the DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  5 non-null      int64 
 1   Name        5 non-null      object
 2   Age         5 non-null      int64 
 3   City        5 non-null      object
 4   Salary      5 non-null      int64 
 5   Date        5 non-null      object
 6   Category    5 non-null      object
dtypes: int64(3), object(4)
memory usage: 412.0+ bytes


In [11]:
df['Name']

0      Alice
1        Bob
2    Charlie
3      David
4        Eva
Name: Name, dtype: object

In [12]:
# Accessing specific columns
df[['Name','Age']]

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35
3,David,28
4,Eva,24


# Data Cleaning and Preprocessing:

In [13]:
data = pd.Series([1, 2, None, 4, None, 6])
data

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
5    6.0
dtype: float64

In [14]:
data.isna()

0    False
1    False
2     True
3    False
4     True
5    False
dtype: bool

In [15]:
data = pd.DataFrame({'A': [1, 2, None, 4, None, 6], 'B': [7, None, 9, None, 11, None]})
data

Unnamed: 0,A,B
0,1.0,7.0
1,2.0,
2,,9.0
3,4.0,
4,,11.0
5,6.0,


In [16]:
data.isna()

Unnamed: 0,A,B
0,False,False
1,False,True
2,True,False
3,False,True
4,True,False
5,False,True


In [17]:
data.dropna()

Unnamed: 0,A,B
0,1.0,7.0


In [18]:
data = pd.DataFrame({'A': [1, 2, None, 4, None, 6], 'B': [7, None, 9, None, 11, None]})
data['A'].fillna(0, inplace=True)
data

Unnamed: 0,A,B
0,1.0,7.0
1,2.0,
2,0.0,9.0
3,4.0,
4,0.0,11.0
5,6.0,


In [19]:
data = pd.Series([1, 2, None, 4, None, 6])
data.interpolate(inplace=True)
print(data)

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
dtype: float64


In [20]:
data = pd.Series([1, None, None, 4, None, 6,None])
data.fillna(method='ffill', inplace=True)
data.drop_duplicates()


0    1.0
3    4.0
5    6.0
dtype: float64

In [21]:
# Handling missing values
df.dropna()  # Drop rows with any missing values

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,Category
0,0,Alice,25,New York,60000,2023-01-01,A
1,1,Bob,30,London,75000,2023-01-02,B
2,2,Charlie,35,Tokyo,80000,2023-01-03,C
3,3,David,28,Paris,62000,2023-01-04,A
4,4,Eva,24,Berlin,57000,2023-01-05,B


In [22]:
# Removing duplicates
df.drop_duplicates()

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,Category
0,0,Alice,25,New York,60000,2023-01-01,A
1,1,Bob,30,London,75000,2023-01-02,B
2,2,Charlie,35,Tokyo,80000,2023-01-03,C
3,3,David,28,Paris,62000,2023-01-04,A
4,4,Eva,24,Berlin,57000,2023-01-05,B


In [23]:
# Convert text to lowercase
df['Name1'] = df['Name'].str.lower()

In [24]:
# Converting data types
df['Age'] = df['Age'].astype('float')
df = df.drop('Name1',axis=1)
df

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,Category
0,0,Alice,25.0,New York,60000,2023-01-01,A
1,1,Bob,30.0,London,75000,2023-01-02,B
2,2,Charlie,35.0,Tokyo,80000,2023-01-03,C
3,3,David,28.0,Paris,62000,2023-01-04,A
4,4,Eva,24.0,Berlin,57000,2023-01-05,B


In [25]:
df["Age"]

0    25.0
1    30.0
2    35.0
3    28.0
4    24.0
Name: Age, dtype: float64

In [26]:
df.rename(columns={'Category': 'cate'}, inplace=True)
df

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate
0,0,Alice,25.0,New York,60000,2023-01-01,A
1,1,Bob,30.0,London,75000,2023-01-02,B
2,2,Charlie,35.0,Tokyo,80000,2023-01-03,C
3,3,David,28.0,Paris,62000,2023-01-04,A
4,4,Eva,24.0,Berlin,57000,2023-01-05,B


In [27]:
#Data Selection and Indexing:
# Select rows based on a condition
df[df['Age'] >= 30]

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate
1,1,Bob,30.0,London,75000,2023-01-02,B
2,2,Charlie,35.0,Tokyo,80000,2023-01-03,C


In [28]:
# Select specific columns
df[['Name', 'City']]

Unnamed: 0,Name,City
0,Alice,New York
1,Bob,London
2,Charlie,Tokyo
3,David,Paris
4,Eva,Berlin


In [29]:
df[2:3]

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate
2,2,Charlie,35.0,Tokyo,80000,2023-01-03,C


In [30]:
df[-3:-1:1]

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate
2,2,Charlie,35.0,Tokyo,80000,2023-01-03,C
3,3,David,28.0,Paris,62000,2023-01-04,A


# Data Exploration:

In [31]:
df['Age'].mean()
df['Salary'].std()
df['Salary'].median()
df['Age'].min()
df['Age'].max()
df['Salary'].sum()

334000

In [32]:
# Data Aggregation:
# Group by a column and calculate the mean for each group
df.groupby('City')['Age'].mean()

City
Berlin      24.0
London      30.0
New York    25.0
Paris       28.0
Tokyo       35.0
Name: Age, dtype: float64

In [33]:
print(df)
# Calculate multiple aggregations simultaneously
df.groupby('City').agg({'Age': 'mean', 'Salary': 'sum'})

   Unnamed: 0     Name   Age      City  Salary        Date cate
0           0    Alice  25.0  New York   60000  2023-01-01    A
1           1      Bob  30.0    London   75000  2023-01-02    B
2           2  Charlie  35.0     Tokyo   80000  2023-01-03    C
3           3    David  28.0     Paris   62000  2023-01-04    A
4           4      Eva  24.0    Berlin   57000  2023-01-05    B


Unnamed: 0_level_0,Age,Salary
City,Unnamed: 1_level_1,Unnamed: 2_level_1
Berlin,24.0,57000
London,30.0,75000
New York,25.0,60000
Paris,28.0,62000
Tokyo,35.0,80000


In [34]:
print("befor",df['Name'][4])
df['Name'][4] = 'Rohith'
print("after",df['Name'][4])

befor Eva
after Rohith


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Name'][4] = 'Rohith'


In [35]:
# Sorting data
df.sort_values(by='Age', ascending=False)

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate
2,2,Charlie,35.0,Tokyo,80000,2023-01-03,C
1,1,Bob,30.0,London,75000,2023-01-02,B
3,3,David,28.0,Paris,62000,2023-01-04,A
0,0,Alice,25.0,New York,60000,2023-01-01,A
4,4,Rohith,24.0,Berlin,57000,2023-01-05,B


In [37]:
# Assign ranks based on Age
df['Rank'] = df['Age'].rank(ascending=True, method='average')
df

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate,Rank
0,0,Alice,25.0,New York,60000,2023-01-01,A,2.0
1,1,Bob,30.0,London,75000,2023-01-02,B,4.0
2,2,Charlie,35.0,Tokyo,80000,2023-01-03,C,5.0
3,3,David,28.0,Paris,62000,2023-01-04,A,3.0
4,4,Rohith,24.0,Berlin,57000,2023-01-05,B,1.0


# Data Manipulation:

In [67]:
# Accessing columns
#print(df['Name'])  # Output: Series with names
#print(df['Age'])
#print(df.loc[1])
#print(df.iloc[1])
#print(df.loc[2:3])
#print(df.iloc[1:4])
# Select rows and columns using .loc and .iloc
#print(df.loc[0, "City"])
#print(df.iloc[1, 1])
df

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate,Rank
0,0,Alice,25.0,New York,60000,2023-01-01,A,2.0
1,1,Bob,30.0,London,75000,2023-01-02,B,4.0
2,2,Charlie,35.0,Tokyo,80000,2023-01-03,C,5.0
3,3,David,28.0,Paris,62000,2023-01-04,A,3.0
4,4,Rohith,24.0,Berlin,57000,2023-01-05,B,1.0
5,5,Roshan,20.0,Blore,20000,2023-01-07,B,5.0
6,6,Roshan,25.0,Blore,20000,2023-01-06,A,5.0
7,7,sameer,,Blore,20000,2023-01-06,,5.0
8,8,sameer,,Paris,20000,2023-01-06,A,5.0


In [68]:
#df.loc[7] = {'Name':'Roshan',"Age":25,'City':'Blore','Salary':20000,'Date':'2023-01-06','Category':"b"}
#print(df)
#df.loc[8] = [8,'sameer', None, 'Paris',20000, '2023-01-06','A', 5.0]
#df = df.drop(7)

In [39]:
# Assuming 'df' is your DataFrame
df_filtered = df[df['Age'] > 30]
df_filtered

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate,Rank
2,2,Charlie,35.0,Tokyo,80000,2023-01-03,C,5.0


In [40]:
df_filtered = df[(df['Age'] > 20) & (df['Name'] == 'David')]
df_filtered

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate,Rank
3,3,David,28.0,Paris,62000,2023-01-04,A,3.0


In [41]:
df.columns

Index(['Unnamed: 0', 'Name', 'Age', 'City', 'Salary', 'Date', 'cate', 'Rank'], dtype='object')

In [42]:
df.groupby('City').agg({'Age': 'mean', 'Salary': 'sum'})

Unnamed: 0_level_0,Age,Salary
City,Unnamed: 1_level_1,Unnamed: 2_level_1
Berlin,24.0,57000
London,30.0,75000
New York,25.0,60000
Paris,28.0,62000
Tokyo,35.0,80000


In [43]:
df_filtered = df.query('Age > 30')
df_filtered

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate,Rank
2,2,Charlie,35.0,Tokyo,80000,2023-01-03,C,5.0


In [44]:
df_filtered = df[df['City'].isin(['New York', 'San Francisco'])]
df_filtered

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate,Rank
0,0,Alice,25.0,New York,60000,2023-01-01,A,2.0


In [45]:
# Filtering rows based on multiple conditions
df[(df['Age'] == 25) & (df['City'] == 'New York')]

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate,Rank
0,0,Alice,25.0,New York,60000,2023-01-01,A,2.0


In [46]:
# Apply
# Sample data
data = [1, 2, 3, 4, 5]
series = pd.Series(data)

# Define a function to double the value of each element
def double(x):
    return x * 2

# Apply the function to the Series
result = series.apply(double)
print(result)

0     2
1     4
2     6
3     8
4    10
dtype: int64


In [47]:
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df3 = pd.DataFrame(data)
print(df3)
# Define a function to double the value of each element
def double(x):
    return x * 2

# Apply the function to the DataFrame, column-wise (default behavior)
result = df3.apply(double)
result1= df3.applymap(double)
print(result1)
print(result)

   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9
   A   B   C
0  2   8  14
1  4  10  16
2  6  12  18
   A   B   C
0  2   8  14
1  4  10  16
2  6  12  18


In [48]:
left_data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 28, 24],
}

left_df = pd.DataFrame(left_data)

# Sample data for right DataFrame
right_data = {
    'ID': [1, 2, 4, 5, 6],
    'City': ['New York', 'London', 'Paris', 'Berlin', 'Tokyo'],
    'Salary': [60000, 75000, 62000, 57000, 80000],
}

right_df = pd.DataFrame(right_data)

In [49]:
# Data Merging and Joining:
# Merging DataFrames based on a common column
merged_df = pd.merge(left_df, right_df, on='ID')
merged_df

Unnamed: 0,ID,Name,Age,City,Salary
0,1,Alice,25,New York,60000
1,2,Bob,30,London,75000
2,4,David,28,Paris,62000
3,5,Eva,24,Berlin,57000


In [50]:
# Joining DataFrames based on the index
# Inner join on 'ID' column
result_inner = pd.merge(left_df, right_df, on='ID', how='inner')
print(result_inner)

# Left join on 'ID' column
result_left = pd.merge(left_df, right_df, on='ID', how='left')
print(result_left)

# Right join on 'ID' column
result_right = pd.merge(left_df, right_df, on='ID', how='right')
print(result_right)

# Outer join on 'ID' column
result_outer = pd.merge(left_df, right_df, on='ID', how='outer')
print(result_outer)

   ID   Name  Age      City  Salary
0   1  Alice   25  New York   60000
1   2    Bob   30    London   75000
2   4  David   28     Paris   62000
3   5    Eva   24    Berlin   57000
   ID     Name  Age      City   Salary
0   1    Alice   25  New York  60000.0
1   2      Bob   30    London  75000.0
2   3  Charlie   35       NaN      NaN
3   4    David   28     Paris  62000.0
4   5      Eva   24    Berlin  57000.0
   ID   Name   Age      City  Salary
0   1  Alice  25.0  New York   60000
1   2    Bob  30.0    London   75000
2   4  David  28.0     Paris   62000
3   5    Eva  24.0    Berlin   57000
4   6    NaN   NaN     Tokyo   80000
   ID     Name   Age      City   Salary
0   1    Alice  25.0  New York  60000.0
1   2      Bob  30.0    London  75000.0
2   3  Charlie  35.0       NaN      NaN
3   4    David  28.0     Paris  62000.0
4   5      Eva  24.0    Berlin  57000.0
5   6      NaN   NaN     Tokyo  80000.0


# Time Series Analysis:

In [51]:
date = pd.to_datetime(df["Date"])
print(date)

date_range = pd.date_range(start='2023-08-01', periods=5, freq='D')
print(date_range)

0   2023-01-01
1   2023-01-02
2   2023-01-03
3   2023-01-04
4   2023-01-05
Name: Date, dtype: datetime64[ns]
DatetimeIndex(['2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04',
               '2023-08-05'],
              dtype='datetime64[ns]', freq='D')


In [52]:
# Creating a DataFrame with dates as the index
data = {'value': [10, 20, 15, 25]}
index_dates = pd.date_range(start='2023-08-01', periods=4, freq='D')
df1 = pd.DataFrame(data, index=index_dates)
print(df1)
print(df1.loc['2023-08-02'])

            value
2023-08-01     10
2023-08-02     20
2023-08-03     15
2023-08-04     25
value    20
Name: 2023-08-02 00:00:00, dtype: int64


In [53]:

# Creating a DataFrame with hourly data
data = {'value': [10, 20, 15, 25, 30, 35]}
index_dates = pd.date_range(start='2023-08-01', periods=6, freq='H')
df2 = pd.DataFrame(data, index=index_dates)
print(df2)
# Resample to daily frequency and calculate the mean for each day
daily_mean = df2.resample('D').mean()
print(daily_mean)

                     value
2023-08-01 00:00:00     10
2023-08-01 01:00:00     20
2023-08-01 02:00:00     15
2023-08-01 03:00:00     25
2023-08-01 04:00:00     30
2023-08-01 05:00:00     35
            value
2023-08-01   22.5


In [54]:
# Shifting the index one day forward
df_shifted = df.shift(2)
df_shifted

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate,Rank
0,,,,,,,,
1,,,,,,,,
2,0.0,Alice,25.0,New York,60000.0,2023-01-01,A,2.0
3,1.0,Bob,30.0,London,75000.0,2023-01-02,B,4.0
4,2.0,Charlie,35.0,Tokyo,80000.0,2023-01-03,C,5.0


# Handling Categorical Data:

In [55]:
# Label Encoding:
import pandas as pd

data = pd.DataFrame({'City': ['New York', 'Paris', 'London', 'Tokyo', 'Paris']})

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
data['City_LabelEncoded'] = le.fit_transform(data['City'])
print(data)

       City  City_LabelEncoded
0  New York                  1
1     Paris                  2
2    London                  0
3     Tokyo                  3
4     Paris                  2


In [56]:
# One-Hot Encoding
data = pd.DataFrame({'City': ['New York', 'Paris', 'London', 'Tokyo', 'Paris']})

one_hot_encoded = pd.get_dummies(data['City'], prefix='City')
data = pd.concat([data, one_hot_encoded], axis=1)
print(data)

       City  City_London  City_New York  City_Paris  City_Tokyo
0  New York        False           True       False       False
1     Paris        False          False        True       False
2    London         True          False       False       False
3     Tokyo        False          False       False        True
4     Paris        False          False        True       False


In [57]:
# Ordinal Encoding
data = pd.DataFrame({'Grade': ['A', 'B', 'C', 'A', 'D']})

grade_mapping = {'A': 3, 'B': 2, 'C': 1, 'D': 0}
data['Grade_OrdinalEncoded'] = data['Grade'].map(grade_mapping)
print(data)

  Grade  Grade_OrdinalEncoded
0     A                     3
1     B                     2
2     C                     1
3     A                     3
4     D                     0


In [58]:
df5 = df[1:].reset_index(drop=True)
df5

Unnamed: 0.1,Unnamed: 0,Name,Age,City,Salary,Date,cate,Rank
0,1,Bob,30.0,London,75000,2023-01-02,B,4.0
1,2,Charlie,35.0,Tokyo,80000,2023-01-03,C,5.0
2,3,David,28.0,Paris,62000,2023-01-04,A,3.0
3,4,Rohith,24.0,Berlin,57000,2023-01-05,B,1.0


In [3]:
import pandas as pd

# Provide the data
data = {
    'color': ['red', 'blue', 'red', 'blue', 'red', 'black', 'red', 'red'],
    'fruit': ['banana', 'banana', 'carrot', 'grape', 'carrot', 'carrot', 'banana', 'grape'],
    'v1': [1, 2, 3, 4, 5, 6, 7, 8],
    'v2': [10, 20, 30, 40, 50, 60, 70, 80]
}

# Create the Pandas DataFrame
df = pd.DataFrame(data)

# Show the DataFrame
df.to_csv('data1.csv')