**What is Pandas?**

Pandas is an open-source data analysis and manipulation library for Python. It provides data structures and functions needed to efficiently analyze, clean, and manipulate structured data. The primary data structures in Pandas are `Series` (one-dimensional) and `DataFrame` (two-dimensional), which allow easy handling of labeled and indexed data.

---

**Importance of Pandas in Data Science**

Pandas plays a crucial role in Data Science due to the following reasons:
1. **Data Handling:** Provides easy-to-use data structures like `DataFrame` for organizing data.
2. **Data Cleaning:** Offers functions for handling missing values, duplicates, and data transformations.
3. **Data Manipulation:** Enables efficient filtering, grouping, and aggregation of large datasets.
4. **Integration:** Works well with other libraries such as NumPy, Matplotlib, and Scikit-Learn.
5. **Performance:** Built on NumPy, allowing fast operations on large datasets.

---

**Difference between Pandas and NumPy**

| Feature | Pandas | NumPy |
|---------|--------|-------|
| Data Structure | Uses `Series` and `DataFrame` | Uses multi-dimensional arrays (`ndarray`) |
| Data Handling | Best for labeled/tabular data | Best for numerical computations |
| Indexing | Supports labeled indexing | Only supports positional indexing |
| Functionality | Data manipulation, cleaning, and analysis | High-performance mathematical and statistical operations |
| Usage | Primarily used in data science and analytics | Mainly used for numerical computing and scientific applications |

---

**Installing Pandas**

To install Pandas, use the following command:
```bash
pip install pandas
```

This command will download and install the Pandas library along with its dependencies.

---

**Importing Pandas**

Once installed, you can import Pandas in a Python script or Jupyter Notebook using:
```python
import pandas as pd
```
The alias `pd` is a commonly used shorthand that makes it easier to use Pandas functions and objects.

---

**Key Features of Pandas**

1. **Efficient Handling of Structured Data**
   - Pandas provides `DataFrame` and `Series` objects that allow for easy manipulation of structured data.
   - Supports multiple file formats like CSV, Excel, SQL, and JSON.
   - Enables fast indexing and retrieval of data.

2. **Data Cleaning and Transformation**
   - Handles missing values using functions like `fillna()`, `dropna()`.
   - Allows transformation of data types, renaming columns, and removing duplicates.
   - Supports advanced operations like applying custom functions to modify data.

3. **Merging, Joining, and Grouping**
   - Supports merging and joining of datasets using `merge()` and `join()`.
   - Enables grouping and aggregation of data using `groupby()`.
   - Provides functionalities for pivot tables and reshaping data.

4. **Time Series Functionality**
   - Handles date and time-based data efficiently.
   - Provides functionalities for resampling, shifting, and time-based indexing.
   - Supports operations on datetime data using `to_datetime()` and `Timedelta`.

5. **Data Visualization Integration**
   - Pandas integrates with visualization libraries like Matplotlib and Seaborn.
   - Provides built-in plotting methods using `.plot()`.
   - Enables quick and insightful graphical representations of data.

---





Series (1D)

A Pandas Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, etc.).

Creating a Series:

### **Example: Creating a Pandas Series**

In [1]:
import numpy as np
import pandas as pd

data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)

0    10
1    20
2    30
3    40
dtype: int64


In [1]:

import pandas as pd

# Example 2: Series with custom index
data2 = [1, 3, 5, 7]
index2 = ['a', 'b', 'c', 'd']
series2 = pd.Series(data2, index=index2)
print(series2)

# Example 3: Series from a dictionary
data3 = {'x': 100, 'y': 200, 'z': 300}
series3 = pd.Series(data3)
print(series3)

# Example 4: Series with a scalar value
series4 = pd.Series(5, index=['a', 'b', 'c'])  # Value 5 repeated three times with index 'a', 'b', 'c'
print(series4)

# Example 5: Series from a NumPy array
import numpy as np
data5 = np.array([10.5, 20.5, 30.5])
print(data5)
series5 = pd.Series(data5)
print(series5)


a    1
b    3
c    5
d    7
dtype: int64
x    100
y    200
z    300
dtype: int64
a    5
b    5
c    5
dtype: int64
[10.5 20.5 30.5]
0    10.5
1    20.5
2    30.5
dtype: float64


In [5]:
import pandas as pd
import numpy as np

# From a list
data = [10, 20, 30, 40]
series1 = pd.Series(data)
print(series1)

# From a dictionary
data_dict = {'a': 100, 'b': 200, 'c': 300}
series2 = pd.Series(data_dict)
print(series2)

# From a NumPy array
array = np.array([5, 10, 15, 20])
series3 = pd.Series(array)
print(series3)


0    10
1    20
2    30
3    40
dtype: int64
a    100
b    200
c    300
dtype: int64
0     5
1    10
2    15
3    20
dtype: int64


In [None]:
print(series1[0])  # Accessing by index
print(series2['b'])  # Accessing by label

10
200


In [None]:
series1 += 10
print(series1)  # Element-wise addition
print(series1.mean())  # Statistical operations

0    20
1    30
2    40
3    50
dtype: int64
35.0


In [None]:
data = [10, 20, 30, 40]
print(series1[series1 > 20])

1    30
2    40
3    50
dtype: int64


A DataFrame is a two-dimensional labeled data structure with columns of potentially different types, similar to a spreadsheet or SQL table.

Creating a DataFrame:

### **Example: Creating a Pandas DataFrame**


In [14]:

data = {
    'Name': ['Hero', 'Bob', 'Charlie'],
    'Age': [19, 30 , 35],
    'City': ['Karachi', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df,"\n")#df
print(type(df),"\n")#dataframe
print(data,"\n")
print(type(data),"\n")#dict
print(data['Name'],"\n")
print(type(data['Name']),"\n")#list

      Name  Age         City
0     Hero   19      Karachi
1      Bob   30  Los Angeles
2  Charlie   35      Chicago 

<class 'pandas.core.frame.DataFrame'> 

{'Name': ['Hero', 'Bob', 'Charlie'], 'Age': [19, 30, 35], 'City': ['Karachi', 'Los Angeles', 'Chicago']} 

<class 'dict'> 

['Hero', 'Bob', 'Charlie'] 

<class 'list'> 



In [32]:

import pandas as pd
import numpy as np
# # Example 6: DataFrame from a list of lists
# data6 = [[1, 'A'], [2, 'B'], [3, 'C']]
# columns6 = ['Number', 'Letter']
# df6 = pd.DataFrame(data6, columns=columns6)
# print("\n",df6)

# # Example 7: DataFrame with specific data types
# data7 = {'col1': [1, 2, 3], 'col2': [4.5, 5.5, 6.5], 'col3': ['A', 'B', 'C']}
# df7 = pd.DataFrame(data7, dtype=object) #Specify data type for columns.
# print("\n",df7)
# print(df7.dtypes)


# # Example 8: DataFrame from a list of dictionaries
# data8 = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 5, 'b': 6}]
# df8 = pd.DataFrame(data8)
# print("\n",df8)
# print(data8[0])

# # Example 9: DataFrame with index specified
# data9 = {'col1': [10, 20, 30], 'col2': [40, 50, 60]}
# index9 = ['row1', 'row2', 'row3']
# df9 = pd.DataFrame(data9, index=index9)
# print("\n",df9,"\n",)

# Example 10: DataFrame from a NumPy array
data10 = np.array([ [1.1, 2],
                    [3.2, 4],
                    [5.6, 6]])
columns10 = ['X', 'Y']
df10 = pd.DataFrame(data10, columns=columns10)
print(df10,"\n")
print(df10['Y'][:])


     X    Y
0  1.1  2.0
1  3.2  4.0
2  5.6  6.0 

0    2.0
1    4.0
2    6.0
Name: Y, dtype: float64


In [36]:
# From a dictionary of lists
# data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35],'City': ['New York', 'Los Angeles', 'Chicago']}
# df1 = pd.DataFrame(data)
# print(df1)

# From a dictionary of Series
# df2 = pd.DataFrame({'A': pd.Series([1, 2, 3]), 'B': pd.Series([4, 5, 6])})
# print(df2)

# From a NumPy array
array_data = np.array([[1, 2, 3], [4, 5, 6]])
df3 = pd.DataFrame(array_data, columns=['X', 'Y', 'Z'])
print(df3['X'][1])
print(df3)

4
   X  Y  Z
0  1  2  3
1  4  5  6


###Reading from a CSV/Excel file###

In [42]:
!pip install openpyxl==3.1.2



In [54]:
# from  import dri
d = pd.read_excel('SaleData.xlsx')
print(d)
# drive.mount('/content/drive')

    OrderDate   Region  Manager   SalesMan          Item   Units  Unit_price  \
0  2018-01-06     East   Martha  Alexander    Television   95.00    1198.000   
1  2018-01-23  Central  Hermann     Shelli  Home Theater   50.00     500.000   
2  2018-02-09  Central  Hermann       Luis    Television   36.00    1198.000   
3  2018-02-26  Central  Timothy      David    Cell Phone   27.00     225.000   
4  2018-03-15     West  Timothy    Stephen    Television   56.00    1198.000   
5  2018-04-01     East   Martha  Alexander  Home Theater   60.00     500.000   
6  2018-04-18  Central   Martha     Steven    Television   75.00    1198.000   
7  2018-05-05  Central  Hermann       Luis    Television   90.00    1198.000   
8  2018-05-22     West  Douglas    Michael    Television   32.00    1198.000   
9  2018-06-08     East   Martha  Alexander  Home Theater   60.00     500.000   
10 2018-06-25  Central  Hermann      Sigal    Television   90.00    1198.000   
11 2018-07-12     East   Martha      Dia

In [None]:
import numpy as np
import openpyxl
import pandas as pd

In [56]:
# df1 = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/SaleData.xlsx") #  # Reading CSV file
df = pd.read_excel("SaleData.xlsx")  # Reading Excel file
print(df)

    OrderDate   Region  Manager   SalesMan          Item   Units  Unit_price  \
0  2018-01-06     East   Martha  Alexander    Television   95.00    1198.000   
1  2018-01-23  Central  Hermann     Shelli  Home Theater   50.00     500.000   
2  2018-02-09  Central  Hermann       Luis    Television   36.00    1198.000   
3  2018-02-26  Central  Timothy      David    Cell Phone   27.00     225.000   
4  2018-03-15     West  Timothy    Stephen    Television   56.00    1198.000   
5  2018-04-01     East   Martha  Alexander  Home Theater   60.00     500.000   
6  2018-04-18  Central   Martha     Steven    Television   75.00    1198.000   
7  2018-05-05  Central  Hermann       Luis    Television   90.00    1198.000   
8  2018-05-22     West  Douglas    Michael    Television   32.00    1198.000   
9  2018-06-08     East   Martha  Alexander  Home Theater   60.00     500.000   
10 2018-06-25  Central  Hermann      Sigal    Television   90.00    1198.000   
11 2018-07-12     East   Martha      Dia

### Understanding DataFrame Structure: ###

In [65]:
# # print(df.head())  # First five rows
# # print(df.tail())  # Last five rows
# print(df.info())  # Summary of the dataset
# print(df.describe())  # Statistical summary
print(df.shape)  # Number of rows and columns
print(df.columns)  # Column names
print(df.index)  # Row indices
print(df.dtypes)  # Data types of each column

(45, 8)
Index(['OrderDate', 'Region', 'Manager', 'SalesMan', 'Item', 'Units',
       'Unit_price', 'Sale_amt'],
      dtype='object')
RangeIndex(start=0, stop=45, step=1)
OrderDate     datetime64[ns]
Region                object
Manager               object
SalesMan              object
Item                  object
Units                float64
Unit_price           float64
Sale_amt             float64
dtype: object


###Accessing Data in a DataFrame:###

In [83]:
# Accessing a single column
#df['column_name']

# Accessing multiple columns
#df[['col1', 'col2']]

# Accessing rows using label-based indexing (loc)
# print(df.loc[0],"\n")  # First row
print("\n",df.loc[:, ['Region']])  # All rows, specific columns

# # # Accessing rows using position-based indexing (iloc)
# print(df.iloc[0],"\n")  # First row
print(df.iloc[:,1],"\n")  # All rows, second column


      Region
0      East
1   Central
2   Central
3   Central
4      West
5      East
6   Central
7   Central
8      West
9      East
10  Central
11     East
12     East
13     East
14  Central
15     East
16  Central
17     East
18     East
19  Central
20  Central
21     East
22  Central
23  Central
24     East
25     West
26  Central
27  Central
28     East
29  Central
30  Central
31  Central
32     East
33  Central
34  Central
35     West
36  Central
37     West
38     West
39  Central
40  Central
41  Central
42  Central
43      NaN
44      NaN
0        East
1     Central
2     Central
3     Central
4        West
5        East
6     Central
7     Central
8        West
9        East
10    Central
11       East
12       East
13       East
14    Central
15       East
16    Central
17       East
18       East
19    Central
20    Central
21       East
22    Central
23    Central
24       East
25       West
26    Central
27    Central
28       East
29    Central
30    Central
31    Centra

##Exercises (Basic)##

1. Create a Pandas Series with your five favorite numbers.

2. Create a DataFrame with columns Name, Age, and Salary for 5 people.

3. Convert a dictionary to a Pandas DataFrame and print its shape.

4. Import a CSV file into a DataFrame and display the first five rows.

5. Create a DataFrame from a list of dictionaries.

##Exercises(Advanced)###
1. Create a Series with country names as indices and their capitals as values.
2. Convert a NumPy array into a DataFrame.
3. Retrieve a specific column from a DataFrame.
4. Retrieve the first three rows of a DataFrame.
5. Modify a specific row's values in a DataFrame.