# **Pandas Basic Course**


# **Creating Series using Pandas**:

1. **Importing pandas**: We import pandas library and nickname it 'pd' for convenience.

2. **Creating a Series**: Using pandas, we convert a list of numbers `[3, 4, 5, 6, 7]` into a Series. 

3. **Printing the Series**: We display the Series, which presents the numbers neatly, each with an associated index.

4. **Checking Series Type**: We verify that our data structure is indeed a pandas Series, specialized for data manipulation tasks.

So, essentially, we're utilizing pandas to effortlessly transform a list of numbers into a Series, leveraging its functionalities for efficient data handling and analysis.

In [3]:
import pandas as pd

List = [3, 4, 5, 6, 7]
ser=pd.Series(List)
print(ser)
print(type(ser))


0    3
1    4
2    5
3    6
4    7
dtype: int64
<class 'pandas.core.series.Series'>


# **Creating a Series with Custom Index:**
Using pandas, we convert the list List into a Series called ser. We also specify a custom index for the Series, using the index parameter. Each value in the list is associated with a specific index label provided in the index argument.

In [6]:
List = [3, 4, 5, 6, 7]
ser=pd.Series(List, index=['a', 'b', 'c', 'd', 'e'])
print(ser)
print(type(ser))

a    3
b    4
c    5
d    6
e    7
dtype: int64
<class 'pandas.core.series.Series'>


# Creating a Series from a Dictionary: 
Using pandas, we convert a dictionary into a Series object named ser, where each dictionary key becomes an index label and the corresponding values form the data points.

**1D Array Structure:** Despite containing lists, pandas consolidates the data into a one-dimensional Series, aligning with its focus on efficient one-dimensional data manipulation.

**In summary**, pandas efficiently transforms a dictionary into a labeled, one-dimensional Series, facilitating comprehensive data analysis.

In [4]:
dic={"name":['python','c', 'c++'],"por":[2,3,4,],"rank":[32,13,14]}
ser = pd.Series(dic)
print(ser)

name    [python, c, c++]
por            [2, 3, 4]
rank        [32, 13, 14]
dtype: object


# Arithmetic Operations on Series

Arithmetic operations on pandas Series involve aligning data based on index labels. For example, if we have two Series, s1 and s2, and we attempt to add them together, pandas will align the data based on index labels. Common indices result in straightforward addition, while missing indices produce NaN (Not a Number) values.

This explanation clarifies how arithmetic operations are conducted on pandas Series, emphasizing data alignment and handling of missing values.

In [None]:
s1 = pd.Series(12,index=(1,2,3,4,5,6,7))
s2 = pd.Series(12,index=(1,2,3,4))
print(s1+s2)

1    24.0

2    24.0

3    24.0

4    24.0

5     NaN

6     NaN

7     NaN

dtype: float64


# Note:
If we attempt to perform the addition operation s1 + s2 with arrays instead of Series, it will not execute directly because arrays in Python do not have built-in support for element-wise operations like Series in pandas do.

In summary, while arrays are basic data structures in Python with limited functionality, pandas Series offer enhanced indexing, data alignment, and additional functionality for data analysis, making them more suitable for handling structured data.

# DATA FRAME

# **DataFrame from List**

We create a DataFrame from a Python list [1, 2, 3, 4, 5, 6].

Each element in the list becomes a row in the DataFrame.

The indices are generated automatically.

The DataFrame is printed in tabular form.

Its type is confirmed as a pandas DataFrame.

In [8]:
list = [1,2,3,4,5,6]
df = pd.DataFrame(list)
print(df)
print(type(df))


   0
0  1
1  2
2  3
3  4
4  5
5  6
<class 'pandas.core.frame.DataFrame'>


# DataFrame as 2D Array

We create a DataFrame df from a 2D list list_2d.

Each inner list in list_2d represents a row in the DataFrame.

The DataFrame df is printed, presenting the data in tabular form resembling a 2D array.

In [10]:
list_2d = [[1,2,3,4,5,6],[2,3,4,5,6,7]]
df = pd.DataFrame(list_2d)
print(df)

   0  1  2  3  4  5
0  1  2  3  4  5  6
1  2  3  4  5  6  7


# DataFrame from Dictionary

We create a DataFrame from a dictionary where keys represent column names and values represent column data.

Each key-value pair in the dictionary corresponds to a column in the DataFrame.
The DataFrame is printed, displaying the data in tabular form with columns labeled 'a', 'b', 'c', 'd', and '1'.

In [9]:
dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5],"c":[1,2,3,4,5],"d":[1,2,3,4,5],1:[1,2,3,4,5]}
df = pd.DataFrame(dic)
print(df)

   a  b  c  d  1
0  1  1  1  1  1
1  2  2  2  2  2
2  3  3  3  3  3
3  4  4  4  4  4
4  5  5  5  5  5


# Arithmetic Operations on DataFrame

We create a DataFrame df from a dictionary where keys represent column names and values represent column data.

Columns 'A' and 'B' are initialized with values [1, 2, 3, 4].

Column 'C' is created by adding columns 'A' and 'B' element-wise.

The DataFrame df is printed, displaying the updated data including the calculated column 'C'.

In [22]:
df = pd.DataFrame({"A":[1,2,3,4,],"B":[1,2,3,4]})
df["C"] =df["A"] + df["B"]
print(df)

   A  B  C
0  1  1  2
1  2  2  4
2  3  3  6
3  4  4  8


In [16]:
df["C"] = df["A"] - df["B"]
print(df)

   A  B  C
0  1  1  0
1  2  2  0
2  3  3  0
3  4  4  0


In [17]:
df["C"] = df["A"] * df["B"]
print(df)

   A  B   C
0  1  1   1
1  2  2   4
2  3  3   9
3  4  4  16


In [24]:
df["C"] = df["A"] / df["B"]
print(df)

   A  B    C  python
0  1  1  1.0    True
1  2  2  1.0    True
2  3  3  1.0    True
3  4  4  1.0    True


# **DataFrame with Conditional Operation**

We create a DataFrame df from a dictionary where keys represent column names and values represent column data.

Columns 'A' and 'B' are initialized with values [1, 2, 3, 4].

A new column 'python' is added to the DataFrame, containing boolean values based on the condition df["A"] <= 3.

The DataFrame df is printed, displaying the updated data including the new column 'python'.

In [25]:
df = pd.DataFrame({"A":[1,2,3,4,],"B":[1,2,3,4]})
df["python"] = df["A"] <= 3
print(df)

   A  B  python
0  1  1    True
1  2  2    True
2  3  3    True
3  4  4   False


# DataFrame with Inserted Column

We create a DataFrame df from a dictionary where keys represent column names and values represent column data.

Columns 'A', 'B', and 'C' are initialized with values [1, 2, 3, 4, 5], [6, 7, 8, 9, 10], and [1, 3, 4, 5, 6] respectively.

A new column 'python' is inserted at index 1, containing values from column 'A'.

The DataFrame df is returned, showing the updated data with the inserted column 'python'.

In [29]:
df = pd.DataFrame({"A":[1,2,3,4,5],"B":[6,7,8,9,10],"C":[1,3,4,5,6]})
df.insert(1,"python",df["A"])
df


Unnamed: 0,A,python,B,C
0,1,1,6,1
1,2,2,7,3
2,3,3,8,4
3,4,4,9,5
4,5,5,10,6


# DataFrame with Sliced Column Assignment

We assign a new column 'python_12' to the DataFrame df by slicing values from column 'A' up to the third index.

The DataFrame df is returned, displaying the updated data with the new column 'python_12'.

In [27]:
df["python_12"] = df["A"][:3]
df

Unnamed: 0,A,python,B,C,python_12
0,1,1,6,1,1.0
1,2,2,7,3,2.0
2,3,3,8,4,3.0
3,4,4,9,5,
4,5,5,10,6,


# Deletion

**DataFrame with Deleted Column POP Function**

We create a DataFrame df from a dictionary where keys represent column names and values represent column data.
Columns 'A', 'B', and 'C' are initialized with values [1, 2, 3, 4, 5], [6, 7, 8, 9, 10], and [1, 3, 4, 5, 6] respectively.
Column 'B' is removed from the DataFrame and assigned to a new DataFrame df_del.
The DataFrame df is returned, displaying the updated data after removing column 'B'.

In [32]:
df = pd.DataFrame({"A":[1,2,3,4,5],"B":[6,7,8,9,10],"C":[1,3,4,5,6]})
df_del = df.pop("B")
df

Unnamed: 0,A,C
0,1,1
1,2,3
2,3,4
3,4,5
4,5,6


# **DataFrame with Deleted Column Del Function**


In [33]:

del df["A"]
df

Unnamed: 0,C
0,1
1,3
2,4
3,5
4,6


# Read CSV File
**DataFrame from CSV File**

In [50]:
df = pd.read_csv("/kaggle/input/pandas-dataset/tips.csv")
pd.DataFrame(df)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,,No,Sun,Dinner,
2,21.01,3.50,Male,No,Sun,Dinner,3.0
3,,3.31,Male,No,,Dinner,2.0
4,24.59,3.61,Female,No,Sun,Dinner,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


In [37]:
df.head(5)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [38]:
df.tail(5)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.0,Female,Yes,Sat,Dinner,2
241,22.67,2.0,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2
243,18.78,3.0,Female,No,Thur,Dinner,2


In [41]:
df[5:8]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
5,25.29,4.71,Male,No,Sun,Dinner,4
6,8.77,2.0,Male,No,Sun,Dinner,2
7,26.88,3.12,Male,No,Sun,Dinner,4


**This operation retrieves the index labels of the DataFrame df.**

In [8]:
df.index

RangeIndex(start=0, stop=244, step=1)

**This operation retrieves the column labels of the DataFrame df.**

In [9]:
df.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

**This operation provides descriptive statistics of the DataFrame df, such as count, mean, standard deviation, minimum, and maximum values for each numeric column.**

In [10]:
df.describe

<bound method NDFrame.describe of      total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner   2.0
1         10.34  1.66     NaN     No   Sun  Dinner   NaN
2         21.01  3.50    Male     No   Sun  Dinner   3.0
3           NaN  3.31    Male     No   NaN  Dinner   2.0
4         24.59  3.61  Female     No   Sun  Dinner   4.0
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner   3.0
240       27.18  2.00  Female    Yes   Sat  Dinner   2.0
241       22.67  2.00    Male    Yes   Sat  Dinner   2.0
242       17.82  1.75    Male     No   Sat  Dinner   2.0
243       18.78  3.00  Female     No  Thur  Dinner   2.0

[244 rows x 7 columns]>


**Modifying a Specific Cell in DataFrame**

This operation sets the value of the cell in row 1 and column 'tip' to 'Python' in the DataFrame df, then displays the updated DataFrame.

In [16]:
df.loc[4,"tip"]='Python'
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,Python,,No,Sun,Dinner,
2,21.01,Python,Male,No,Sun,Dinner,3.0
3,,3.31,Male,No,,Dinner,2.0
4,24.59,Python,Female,No,Sun,Dinner,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.0,Female,Yes,Sat,Dinner,2.0
241,22.67,2.0,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


**This operation sets the value of the cell located at row index 4 and column index 1 to 'Python' in the DataFrame df, then displays the updated DataFrame.**

In [19]:
df.iloc[4,1]='Python'
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,Python,,No,Sun,Dinner,
2,21.01,Python,Male,No,Sun,Dinner,3.0
3,,Python,Male,No,,Dinner,2.0
4,24.59,3.26,Female,No,Sun,Dinner,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.0,Female,Yes,Sat,Dinner,2.0
241,22.67,2.0,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


**This iloc function also use for getting values from specific index**

In [20]:
df.iloc[9,3]

'No'

# Dropping a Column

This operation removes the column labeled 'day' from the DataFrame df along the specified axis (axis=1, which corresponds to columns).

In [24]:
df.drop('day', axis=1)

Unnamed: 0,total_bill,tip,sex,smoker,time,size
0,16.99,1.01,Female,No,Dinner,2.0
1,10.34,Python,,No,Dinner,
2,21.01,Python,Male,No,Dinner,3.0
3,,Python,Male,No,Dinner,2.0
4,24.59,3.26,Female,No,Dinner,4.0
...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Dinner,3.0
240,27.18,2.0,Female,Yes,Dinner,2.0
241,22.67,2.0,Male,Yes,Dinner,2.0
242,17.82,1.75,Male,No,Dinner,2.0


# Converting DataFrame to Arrays
This process transforms the data stored in a DataFrame into array format for further manipulation or analysis.

In [42]:
df.index.array

<NumpyExtensionArray>
[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
 ...
 234, 235, 236, 237, 238, 239, 240, 241, 242, 243]
Length: 244, dtype: int64

In [43]:
df.to_numpy()

array([[16.99, 1.01, 'Female', ..., 'Sun', 'Dinner', 2],
       [10.34, 1.66, 'Male', ..., 'Sun', 'Dinner', 3],
       [21.01, 3.5, 'Male', ..., 'Sun', 'Dinner', 3],
       ...,
       [22.67, 2.0, 'Male', ..., 'Sat', 'Dinner', 2],
       [17.82, 1.75, 'Male', ..., 'Sat', 'Dinner', 2],
       [18.78, 3.0, 'Female', ..., 'Thur', 'Dinner', 2]], dtype=object)

In [44]:
import numpy as np
array = np.asarray(df)
array

array([[16.99, 1.01, 'Female', ..., 'Sun', 'Dinner', 2],
       [10.34, 1.66, 'Male', ..., 'Sun', 'Dinner', 3],
       [21.01, 3.5, 'Male', ..., 'Sun', 'Dinner', 3],
       ...,
       [22.67, 2.0, 'Male', ..., 'Sat', 'Dinner', 2],
       [17.82, 1.75, 'Male', ..., 'Sat', 'Dinner', 2],
       [18.78, 3.0, 'Female', ..., 'Thur', 'Dinner', 2]], dtype=object)

**Sorting DataFrame by Index**

This operation sorts the DataFrame df based on the index labels in descending order along the specified axis.

In [45]:
df.sort_index(axis=0,ascending= False)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
243,18.78,3.00,Female,No,Thur,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
240,27.18,2.00,Female,Yes,Sat,Dinner,2
239,29.03,5.92,Male,No,Sat,Dinner,3
...,...,...,...,...,...,...,...
4,24.59,3.61,Female,No,Sun,Dinner,4
3,23.68,3.31,Male,No,Sun,Dinner,2
2,21.01,3.50,Male,No,Sun,Dinner,3
1,10.34,1.66,Male,No,Sun,Dinner,3


**Dropping Rows**

This operation removes the row with **index label 0** from the DataFrame df along the specified axis (axis=0, which corresponds to rows).

In [46]:
df.drop(0,axis=0)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
5,25.29,4.71,Male,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Removes the row with **index label 1** from the DataFrame

In [47]:
df.drop(1,axis=0)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
5,25.29,4.71,Male,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


# Dropping Rows with Missing Values

This operation removes rows from the DataFrame df that contain any missing values (NaN).

In [51]:
df.dropna()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
2,21.01,3.50,Male,No,Sun,Dinner,3.0
4,24.59,3.61,Female,No,Sun,Dinner,4.0
9,14.78,3.23,Male,No,Sun,Dinner,2.0
10,10.27,1.71,Male,No,Sun,Dinner,2.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


# Dropping Rows with Missing Values in a Specific Column

This operation removes rows from the DataFrame var where the column 'day' contains missing values (NaN).

In [57]:
df.dropna(subset=['day'])

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,,No,Sun,Dinner,
2,21.01,3.50,Male,No,Sun,Dinner,3.0
4,24.59,3.61,Female,No,Sun,Dinner,4.0
5,25.29,4.71,Male,,Sun,Dinner,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


# Dropping Rows with Missing Values (In-Place)

This operation removes rows with missing values (NaN) from the DataFrame var and modifies the DataFrame in-place.

In [58]:
df.dropna(inplace=True)
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
2,21.01,3.50,Male,No,Sun,Dinner,3.0
4,24.59,3.61,Female,No,Sun,Dinner,4.0
9,14.78,3.23,Male,No,Sun,Dinner,2.0
10,10.27,1.71,Male,No,Sun,Dinner,2.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


# Dropping Rows with Any Missing Values

This operation removes rows from the DataFrame df where any column contains missing values (NaN).

In [59]:
df.dropna(how="any")

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
2,21.01,3.50,Male,No,Sun,Dinner,3.0
4,24.59,3.61,Female,No,Sun,Dinner,4.0
9,14.78,3.23,Male,No,Sun,Dinner,2.0
10,10.27,1.71,Male,No,Sun,Dinner,2.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


# Dropping Columns with Missing Values
axis=0 for rows and axis=1 for columns

In [60]:
df.dropna(axis=1)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
2,21.01,3.50,Male,No,Sun,Dinner,3.0
4,24.59,3.61,Female,No,Sun,Dinner,4.0
9,14.78,3.23,Male,No,Sun,Dinner,2.0
10,10.27,1.71,Male,No,Sun,Dinner,2.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


# **Filling Missing Values**

This operation fills missing values (NaN) in the DataFrame df with the specified value 'python1'.

In [56]:
df.fillna('python1')

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,python1,No,Sun,Dinner,python1
2,21.01,3.5,Male,No,Sun,Dinner,3.0
3,python1,3.31,Male,No,python1,Dinner,2.0
4,24.59,3.61,Female,No,Sun,Dinner,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.0,Female,Yes,Sat,Dinner,2.0
241,22.67,2.0,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


# Filling Missing Values with Backward And Forward Fill

**This operation fills missing values (NaN) in the DataFrame df using the values from the next row (backward fill).**

In [26]:
df.fillna(method="bfill")

  df.fillna(method="bfill")


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,Python,Male,No,Sun,Dinner,3.0
2,21.01,Python,Male,No,Sun,Dinner,3.0
3,24.59,Python,Male,No,Sun,Dinner,2.0
4,24.59,3.26,Female,No,Sun,Dinner,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.0,Female,Yes,Sat,Dinner,2.0
241,22.67,2.0,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


In [27]:
df.fillna(method="ffill")

  df.fillna(method="ffill")


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,Python,Female,No,Sun,Dinner,2.0
2,21.01,Python,Male,No,Sun,Dinner,3.0
3,21.01,Python,Male,No,Sun,Dinner,2.0
4,24.59,3.26,Female,No,Sun,Dinner,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.0,Female,Yes,Sat,Dinner,2.0
241,22.67,2.0,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


**new functions**

In [7]:
df = pd.read_csv("/kaggle/input/pandas-dataset/tips.csv")
df.ffill()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,Male,No,Sun,Dinner,3.0
2,21.01,3.50,Male,No,Sun,Dinner,3.0
3,24.59,3.31,Male,No,Sun,Dinner,2.0
4,24.59,3.61,Female,No,Sun,Dinner,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


In [None]:
df = pd.read_csv("/kaggle/input/pandas-dataset/tips.csv")
df.bfill()

**Filling Missing Values (In-Place)**

This operation fills missing values (NaN) in the DataFrame df with the value 12, and modifies the DataFrame in-place.

In [28]:
df.fillna(12,inplace=True)
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,Python,12,No,Sun,Dinner,12.0
2,21.01,Python,Male,No,Sun,Dinner,3.0
3,12.00,Python,Male,No,12,Dinner,2.0
4,24.59,3.26,Female,No,Sun,Dinner,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.0,Female,Yes,Sat,Dinner,2.0
241,22.67,2.0,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


# Interpolation in DataFrame

**The interpolate() function in pandas fills missing values (NaN) in the DataFrame df by linear interpolation along the columns. Linear interpolation estimates missing values based on neighboring values, creating a smooth transition between existing data points.**

In [57]:
df = pd.read_csv("/kaggle/input/pandas-dataset/tips.csv")
df.iloc[1, 6]=None
df.iloc[2, 6]=None
df.iloc[3, 6]=None
df.iloc[4, 6]=None
df


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,,No,Sun,Dinner,
2,21.01,3.50,Male,No,Sun,Dinner,
3,,3.31,Male,No,,Dinner,
4,24.59,3.61,Female,No,Sun,Dinner,
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


In [58]:
df.interpolate()

  df.interpolate()


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,,No,Sun,Dinner,2.4
2,21.01,3.50,Male,No,Sun,Dinner,2.8
3,22.80,3.31,Male,No,,Dinner,3.2
4,24.59,3.61,Female,No,Sun,Dinner,3.6
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


In [62]:
df.interpolate(method="linear", axis=0)

  df.interpolate(method="linear", axis=0)


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,,No,Sun,Dinner,2.4
2,21.01,3.50,Male,No,Sun,Dinner,2.8
3,22.80,3.31,Male,No,,Dinner,3.2
4,24.59,3.61,Female,No,Sun,Dinner,3.6
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


In [63]:
df.interpolate(limit=2)

  df.interpolate(limit=2)


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,,No,Sun,Dinner,2.4
2,21.01,3.50,Male,No,Sun,Dinner,2.8
3,22.80,3.31,Male,No,,Dinner,
4,24.59,3.61,Female,No,Sun,Dinner,
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


# Replacing Values

**This operation replaces all occurrences of the value "pregnant" with the value 35 in the DataFrame df.**

In [31]:
df.replace(to_replace="Dinner",value=35)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,35,2.0
1,10.34,Python,12,No,Sun,35,12.0
2,21.01,Python,Male,No,Sun,35,3.0
3,12.00,Python,Male,No,12,35,2.0
4,24.59,3.26,Female,No,Sun,35,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,35,3.0
240,27.18,2.0,Female,Yes,Sat,35,2.0
241,22.67,2.0,Male,Yes,Sat,35,2.0
242,17.82,1.75,Male,No,Sat,35,2.0


In [32]:
df.replace([1,2,3,4,5,6,7,8,9], 12)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,12.0
1,10.34,Python,12,No,Sun,Dinner,12.0
2,21.01,Python,Male,No,Sun,Dinner,12.0
3,12.00,Python,Male,No,12,Dinner,12.0
4,24.59,3.26,Female,No,Sun,Dinner,12.0
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,12.0
240,27.18,12,Female,Yes,Sat,Dinner,12.0
241,22.67,12,Male,Yes,Sat,Dinner,12.0
242,17.82,1.75,Male,No,Sat,Dinner,12.0


# Replacing Values Using Regular Expression

**This operation replaces all occurrences of alphabetic characters in the DataFrame df with the value "12", utilizing regular expressions.**

In [36]:
df.replace("[A-Za-z]", "12", regex=True)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,121212121212,1212,121212,121212121212,2.0
1,10.34,121212121212,12,1212,121212,121212121212,12.0
2,21.01,121212121212,12121212,1212,121212,121212121212,3.0
3,12.00,121212121212,12121212,1212,12,121212121212,2.0
4,24.59,3.26,121212121212,1212,121212,121212121212,4.0
...,...,...,...,...,...,...,...
239,29.03,5.92,12121212,1212,121212,121212121212,3.0
240,27.18,2.0,121212121212,121212,121212,121212121212,2.0
241,22.67,2.0,12121212,121212,121212,121212121212,2.0
242,17.82,1.75,12121212,1212,121212,121212121212,2.0


In [50]:
df = pd.read_csv("/kaggle/input/pandas-dataset/tips.csv")
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,,No,Sun,Dinner,
2,21.01,3.50,Male,No,Sun,Dinner,
3,,3.31,Male,No,,Dinner,
4,24.59,3.61,Female,No,Sun,Dinner,
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


**This code replaces all occurrences of the value 2 in the DataFrame `df` using forward fill method.**

In [49]:
df.iloc[1, 6]=None
df.iloc[2, 6]=None
df.iloc[3, 6]=None
df.iloc[4, 6]=None

In [53]:
df.replace(2, method="ffill")

  df.replace(2, method="ffill")


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,,No,Sun,Dinner,
2,21.01,3.50,Male,No,Sun,Dinner,
3,,3.31,Male,No,,Dinner,
4,24.59,3.61,Female,No,Sun,Dinner,
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,5.92,Female,Yes,Sat,Dinner,3.0
241,22.67,5.92,Male,Yes,Sat,Dinner,3.0
242,17.82,1.75,Male,No,Sat,Dinner,3.0


In [54]:
df.replace(2, method="ffill", limit=2)

  df.replace(2, method="ffill", limit=2)


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2.0
1,10.34,1.66,,No,Sun,Dinner,
2,21.01,3.50,Male,No,Sun,Dinner,
3,,3.31,Male,No,,Dinner,
4,24.59,3.61,Female,No,Sun,Dinner,
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3.0
240,27.18,5.92,Female,Yes,Sat,Dinner,3.0
241,22.67,5.92,Male,Yes,Sat,Dinner,3.0
242,17.82,1.75,Male,No,Sat,Dinner,2.0


# Merging DataFrames on Column 'A'

**This operation merges DataFrames df1 and df2 based on the common column 'A'. The resulting DataFrame contains columns 'A', 'B', and 'C', where values from both DataFrames are matched based on their corresponding values in column 'A'.**

In [10]:
import pandas as pd
df1 = pd.DataFrame({"A":[1,2,3,4,5],"B":[2,4,6,8,10]})
df2 = pd.DataFrame({"A":[1,2,3,4,5],"C":[22,44,66,88,110]})
pd.merge(df1,df2,on="A")

Unnamed: 0,A,B,C
0,1,2,22
1,2,4,44
2,3,6,66
3,4,8,88
4,5,10,110


In [12]:
df1 = pd.DataFrame({"A":[1,2,3,4,5],"B":[2,4,6,8,10]})
df2 = pd.DataFrame({"A":[1,2,3],"C":[22,44,66]})
pd.merge(df1,df2, on="A")

Unnamed: 0,A,B,C
0,1,2,22
1,2,4,44
2,3,6,66


In [76]:
pd.merge(df1,df2, how="left")

Unnamed: 0,A,B,C
0,1,2,22.0
1,2,4,44.0
2,3,6,66.0
3,4,8,
4,5,10,


In [83]:
pd.merge(df1,df2,right_index=True, left_index=True, suffixes=("Pandas", "name"))

Unnamed: 0,APandas,B,Aname,C
0,1,2,1,22
1,2,4,2,44
2,3,6,3,66
3,4,8,4,88
4,5,10,5,110


# Concatenating DataFrames Vertically

This operation concatenates DataFrames df1 and df2 along the rows (axis=0). The resulting DataFrame stacks df1 on top of df2, combining their rows. Columns from both DataFrames are included, and missing values are filled with NaN where columns do not match.

In [5]:
pd.concat([df1, df2], axis=0)

Unnamed: 0,A,B,C
0,1,2.0,
1,2,4.0,
2,3,6.0,
3,4,8.0,
4,5,10.0,
0,1,,22.0
1,2,,44.0
2,3,,66.0
3,4,,88.0
4,5,,110.0


# Joining DataFrames

This operation joins DataFrames df1 and df2 based on their index. Since df1 has more rows than df2, the resulting DataFrame will include all rows from df1 and align the rows from df2 based on the index. Missing values will be filled with NaN where df2 does not have corresponding rows.

In [16]:
df1 = pd.DataFrame({"A":[1,2,3,4,5],"B":[2,4,6,8,10]})
df2 = pd.DataFrame({"C":[1,2,3],"D":[22,44,66]})
df1.join(df2)

Unnamed: 0,A,B,C,D
0,1,2,1.0,22.0
1,2,4,2.0,44.0
2,3,6,3.0,66.0
3,4,8,,
4,5,10,,


In [25]:
dic=pd.DataFrame({"Name":['a','b','c','a','a','c','c','a','a','b'], 
     "Sub1":[12,43,55,23,43,12,53,55,43,12],
     "ID":[1231,344,4554,45,345,45465,45,345,342,4554]})

# Grouping and Iterating Through DataFrame

**This operation groups the DataFrame dic by the "Name" column and iterates through each group, printing the group name and its corresponding DataFrame slice.**

In [27]:
df_new=dic.groupby("Name")
for x,y in df_new:
    print(x)
    print(y)
    

a
  Name  Sub1    ID
0    a    12  1231
3    a    23    45
4    a    43   345
7    a    55   345
8    a    43   342
b
  Name  Sub1    ID
1    b    43   344
9    b    12  4554
c
  Name  Sub1     ID
2    c    55   4554
5    c    12  45465
6    c    53     45


**This operation retrieves the DataFrame slice corresponding to the group labeled "a" from the grouped DataFrame df_new.**

In [28]:
df_new.get_group("a")

Unnamed: 0,Name,Sub1,ID
0,a,12,1231
3,a,23,45
4,a,43,345
7,a,55,345
8,a,43,342


**Minimum Values from three groups**

In [29]:
df_new.min()

Unnamed: 0_level_0,Sub1,ID
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
a,12,45
b,12,344
c,12,45


**Make list from DataFrame**

In [31]:
li = list(df_new)
li

[('a',
    Name  Sub1    ID
  0    a    12  1231
  3    a    23    45
  4    a    43   345
  7    a    55   345
  8    a    43   342),
 ('b',
    Name  Sub1    ID
  1    b    43   344
  9    b    12  4554),
 ('c',
    Name  Sub1     ID
  2    c    55   4554
  5    c    12  45465
  6    c    53     45)]

In [33]:
df=pd.DataFrame({"Days":[1,2,3,4,5,6], 
     "Eng":[12,43,55,23,43,12],
     "Urdu":[23,44,45,45,35,25]})

# Melting DataFrame

**This operation transforms the DataFrame df from a wide format to a long format, using "Days" as the identifier variable. Each column (except "Days") is converted into rows, resulting in a more compact and stacked representation of the data.**

In [36]:
pd.melt(df, id_vars=["Days"])

Unnamed: 0,Days,variable,value
0,1,Eng,12
1,2,Eng,43
2,3,Eng,55
3,4,Eng,23
4,5,Eng,43
5,6,Eng,12
6,1,Urdu,23
7,2,Urdu,44
8,3,Urdu,45
9,4,Urdu,45


**we can change name of columns in melt function like:**

In [37]:
pd.melt(df, id_vars=["Days"], var_name="Python")

Unnamed: 0,Days,Python,value
0,1,Eng,12
1,2,Eng,43
2,3,Eng,55
3,4,Eng,23
4,5,Eng,43
5,6,Eng,12
6,1,Urdu,23
7,2,Urdu,44
8,3,Urdu,45
9,4,Urdu,45


**Also change the value of subject**

In [39]:
pd.melt(df, id_vars=["Days"], var_name="Subjects", value_name="Marks")

Unnamed: 0,Days,Subjects,Marks
0,1,Eng,12
1,2,Eng,43
2,3,Eng,55
3,4,Eng,23
4,5,Eng,43
5,6,Eng,12
6,1,Urdu,23
7,2,Urdu,44
8,3,Urdu,45
9,4,Urdu,45


In [40]:
df=pd.DataFrame({"St_name": ["Ali","Asad","Ahmad","Hamza","Usman","Naseem"],"Days":[1,2,3,4,5,6], 
     "Eng":[12,43,55,23,43,12],
     "Urdu":[23,44,45,45,35,25]})

# Creating a Pivot Table with Margins

**This operation creates a pivot table from the DataFrame df, with "St_name" as the index and "Days" as the columns, aggregating the data using the mean function. The margins=True parameter adds subtotals (margins) to the pivot table for both rows and columns.**

In [44]:
df.pivot_table(index="St_name", columns="Days", aggfunc="mean", margins=True)

Unnamed: 0_level_0,Eng,Eng,Eng,Eng,Eng,Eng,Eng,Urdu,Urdu,Urdu,Urdu,Urdu,Urdu,Urdu
Days,1,2,3,4,5,6,All,1,2,3,4,5,6,All
St_name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2
Ahmad,,,55.0,,,,55.0,,,45.0,,,,45.0
Ali,12.0,,,,,,12.0,23.0,,,,,,23.0
Asad,,43.0,,,,,43.0,,44.0,,,,,44.0
Hamza,,,,23.0,,,23.0,,,,45.0,,,45.0
Naseem,,,,,,12.0,12.0,,,,,,25.0,25.0
Usman,,,,,43.0,,43.0,,,,,35.0,,35.0
All,12.0,43.0,55.0,23.0,43.0,12.0,31.333333,23.0,44.0,45.0,45.0,35.0,25.0,36.166667
