## What is Pandas?

Pandas is a popular open-source data manipulation and analysis library for Python. It provides data structures for efficiently storing and manipulating large datasets, along with functions for reading and writing data in different file formats. The primary data structures in pandas are:

1. **Series:** One-dimensional labeled array capable of holding any data type. It is similar to a column in a spreadsheet or a single column in a DataFrame.

2. **DataFrame:** A two-dimensional table with labeled axes (rows and columns). It is the primary data structure used in pandas and can be thought of as a container for Series objects.

Some key features and functionalities of pandas include:

- **Integration with NumPy:** Pandas is built on top of the NumPy library, which provides high-performance numerical operations. This integration allows for seamless interaction between NumPy and pandas.

- **Data I/O:** Pandas supports various file formats, including CSV, Excel, SQL databases, and more, making it easy to import and export data.

- **Data Exploration:** It allows for easy data exploration and manipulation, such as filtering, grouping, and aggregating data.

- **Data Cleaning:** Pandas provides functions to handle missing data, duplicate values, and other common data cleaning tasks.

- **Time Series Data:** It has robust support for working with time series data, making it suitable for analyzing temporal data.



## Creating Series

A series is a one dimensional array-like object that contains a sequence of values with associated labels, called index. All item in a series contains the same type of data which is similar to numpy's homogenous property. While it's possible to have a mixed data type Series, keep in mind that operations on such Series might have behavior that depends on the data types involved, and it's often convenient to work with Series of homogeneous data types for consistency and predictable behavior. Here are several ways to create a Series in pandas:

1. **From a List:**
   You can create a Series from a Python list.

    ```python
    import pandas as pd
    data_list = [1, 2, 3, 4, 5]
    series_from_list = pd.Series(data_list)
    ```

2. **From a NumPy Array:**
   Pandas Series can be created from a NumPy array.

    ```python
    import pandas as pd
    import numpy as np
    data_array = np.array([1, 2, 3, 4, 5])
    series_from_array = pd.Series(data_array)
    ```

3. **From a Dictionary:**
   Keys of the dictionary become the index of the Series, and values become the data.

    ```python
    import pandas as pd
    data_dict = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
    series_from_dict = pd.Series(data_dict)
    ```

**Specifying Index:**

When you create a pandas Series from a list-like object, by default, it is assigned a numerical index. However, if you wish to customize the index labels for better identification, you can do so by using the `index` parameter. This parameter allows you to explicitly specify the index labels you want associated with each element in the Series.

For instance, consider the following code:

```python
import pandas as pd

# Sample data
data = [10, 20, 30, 40, 50]

# Default Series with numerical index
default_series = pd.Series(data)

# Creating a Series with a custom index
custom_index = ['a', 'b', 'c', 'd', 'e']
series_with_index = pd.Series(data, index=custom_index)
```

In the `default_series`, the default numerical index will be assigned. However, in `series_with_index`, we use the `index` parameter to specify a custom index, resulting in a Series where each element is associated with a label ('a', 'b', 'c', 'd', 'e') for easier reference and interpretation.

In [18]:
import pandas as pd
import numpy as np
# data_array = np.array([1, 2, 3, 4, 5])
# series_from_array = pd.Series(data_array, index = ['a','b','c','d','e'])
# series_from_array
data_dict = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series_from_dict = pd.Series(data_dict)
series_from_dict


a    1
b    2
c    3
d    4
e    5
dtype: int64

## Creating DataFrame

Pandas DataFrame is a 2 dimensional data structure with rows and columns. It is similar to a google sheet or excel file with more than one column.

Here are several ways you can create a DataFrame in pandas:

1. **From a Dictionary of Lists:**
   You can create a DataFrame from a dictionary where keys are column names and values are lists.

    ```python
    import pandas as pd

    data_dict = {'Name': ['Alice', 'Bob', 'Charlie'],
                 'Age': [25, 30, 35],
                 'City': ['New York', 'San Francisco', 'Los Angeles']}

    df = pd.DataFrame(data_dict)
    ```

2. **From a List of Lists:**
   Create a DataFrame directly from a list of lists. The inner lists represent rows.

    ```python
    import pandas as pd

    data_list = [['Alice', 25, 'New York'],
                 ['Bob', 30, 'San Francisco'],
                 ['Charlie', 35, 'Los Angeles']]

    df = pd.DataFrame(data_list, columns=['Name', 'Age', 'City'])
    ```

3. **From a List of Dictionaries:**
   If your data is in the form of a list of dictionaries, each dictionary represents a row.

    ```python
    import pandas as pd

    data_list_of_dicts = [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
                          {'Name': 'Bob', 'Age': 30, 'City': 'San Francisco'},
                          {'Name': 'Charlie', 'Age': 35, 'City': 'Los Angeles'}]

    df = pd.DataFrame(data_list_of_dicts)
    ```

4. **From a NumPy Array:**
   You can create a DataFrame from a NumPy array and specify column names.

    ```python
    import pandas as pd
    import numpy as np

    data_array = np.array([[1, 2, 3],
                           [4, 5, 6],
                           [7, 8, 9]])

    df = pd.DataFrame(data_array, columns=['A', 'B', 'C'])
    ```

5. **From a CSV File:**
   Read data from a CSV file and create a DataFrame.
  [(Download Iris Dataset)](https://drive.google.com/file/d/1Aj55LWNHUOv4OCS4jXVHTq9m25Cvqp-a/view?usp=share_link)

    ```python
    import pandas as pd

    df = pd.read_csv('iris.csv')
    ```
    
6. **From a Series:**
   You can create a DataFrame from a Series.
    ```python
    import pandas as pd

    data_series = pd.Series([10, 20, 30, 40, 50], name='Numbers')

    data_frame = pd.DataFrame(data_series)
    ```

In [19]:
import pandas as pd

data_list_of_dicts = [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
                      {'Name': 'Bob', 'Age': 30, 'City': 'San Francisco'},
                      {'Name': 'Charlie', 'Age': 35, 'City': 'Los Angeles'}]

df = pd.DataFrame(data_list_of_dicts, index = ['a','c','d'])
df

Unnamed: 0,Name,Age,City
a,Alice,25,New York
c,Bob,30,San Francisco
d,Charlie,35,Los Angeles


## Accessing Items of a Series

1. **Accessing by Index:** You can access elements in a Series using the index label.

   ```python
   import pandas as pd
   series = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
   item_a = series['a']
   ```

2. **Accessing by Position (Integer Indexing):** You can retrieve elements by their integer positions using `iloc`, which functions similarly to selecting items based on their index positions in a Python list.

   ```python
   # Accessing by integer position
   item_0 = series.iloc[0]
   ```

3. **Slicing:** You can use slicing to select multiple items based on their positions.

   ```python
   # Slicing by integer positions
   sliced_series = series[1:3]
   ```

4. **Boolean Indexing:** You can use boolean indexing to select items based on a condition.

   ```python
   # Boolean indexing
   condition = series > 15
   filtered_series = series[condition]
   ```

5. **Fancy Indexing:** You can use a list of labels or positions for selection.

   ```python
   # Fancy indexing
   items = series[['a', 'c']]
   ```

In [26]:
import pandas as pd
series = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
item_a = series.loc['b']
print(item_a)

20


## Accessing Columns of a Dataframe:

1. **Using Bracket Notation:**
   ```python
   # Selecting a single column
   column_data = df['column_name']
   # Selecting multiple columns
   selected_columns = df[['column_name1', 'column_name2']]
   ```

2. **Using Dot Notation (if column names are valid Python identifiers):**
   ```python
   # Selecting a single column
   column_data = df.column_name
   # Note: This method is not suitable if column names have spaces or special characters.
   ```

3. **Filtering Columns by Name:**
   ```python
   # Selecting columns with names containing a substring
   selected_columns = df.filter(like='partial_column_name')
   ```

4. **Selecting Columns by Data Type:**
   ```python
   # Selecting columns of a specific data type (e.g., numerical columns)
   selected_columns = df.select_dtypes(include='number')

   # Select Object Data Types (e.g., strings):
   object_columns = df.select_dtypes(include='object')

   # Select Multiple Data Types
   numeric_and_object_columns = df.select_dtypes(include=['number', 'object'])
    
   # Exclude Specific Data Types:
   non_numeric_columns = df.select_dtypes(exclude='number')

   ```


In [21]:
selected_columns = df.select_dtypes(include='object')
selected_columns

Unnamed: 0,Name,City
a,Alice,New York
c,Bob,San Francisco
d,Charlie,Los Angeles


## Accessing Rows of a Dataframe

   - **Integer Indexing (`iloc`):** Select a specific item by providing integer positions.
     ```python
     # Selecting a single row by index
     row = df.iloc[2]

     # Selecting multiple rows by index
     rows = df.iloc[2:5]  # selects rows 2 through 4
     ```

   - **Label Indexing (`loc`):** Select a specific item by providing row label.
     ```python
     # Selecting a single row by label
     row = df.loc['row_label']

     # Selecting multiple rows by label
     rows = df.loc['row_label_1':'row_label_3']
     ```

   - **Conditional Selection:** Select items that satisfy a condition.
     ```python
     # Selecting rows based on a condition
     condition = df['column_name'] > 50
     selected_rows = df[condition]
     ```

     You can combine conditions using logical operators like `&` (and), `|` (or), and `~` (not).

   - **Using the `query` Method:**
     ```python
     # Selecting rows using a query string
     selected_rows = df.query('column_name > 50')
     ```

   - **Selecting Rows with Specific Values of a column:**
     ```python
     # Selecting rows in a DataFrame based on whether the values in a particular column ('column_name') are present in a list (['value1', 'value2']).
     selected_rows = df[df['column_name'].isin(['value1', 'value2'])]
     ```

In [55]:
import pandas as pd

data = {
    'Name': ['Maahir', 'Bingus', 'Elmo'],
    'Age': [21, 2, 7],
    'Color': ['Brown', 'Mocha', 'White'],
    'City': ['Bridgeport', 'Dumpster', 'Beverly']
}

df = pd.DataFrame(data, index=['Me', 'Bingo', 'Elmo'])

row = (df['Age'] > 15) & (df['Name'] =='Maahir')
selected_rows = df[row] 

selected_rows = df[df['Age'].isin([7, 21])]
print(selected_rows)
# print(df)





        Name  Age  Color        City
Me    Maahir   21  Brown  Bridgeport
Elmo    Elmo    7  White     Beverly


## Accesing Rows and Columns

1. **Using `loc` with Row and Column Labels:** Select specific rows and columns by providing labels.

    ```python
    import pandas as pd

    # Assuming 'df' is your DataFrame
    selected_data = df.loc[['row1', 'row2'], ['col1', 'col2']]
    ```

2. **Using `iloc` with Integer Positions:** Select specific rows and columns by providing integer positions.

    ```python
    import pandas as pd

    # Assuming 'df' is your DataFrame
    selected_data = df.iloc[[0, 1], [0, 1]]
    ```
   Here, replace 0, 1 with the actual integer positions of rows and columns you want to select.

3. **Selecting a Range of Rows and Columns:** You can also use slices with `loc` and `iloc` to select ranges of rows and columns.

    ```python
    import pandas as pd

    # Using loc
    selected_data_loc = df.loc['start_row':'end_row', 'start_col':'end_col']

    # Using iloc
    selected_data_iloc = df.iloc[start_row_position:end_row_position, start_col_position:end_col_position]
    ```

## Modifying content of series and DataFrame

DataFrames, Series are mutable objects, and you can perform various operations to modify their values. 
1. **Modifying Series:**
```python
import pandas as pd

# Creating a Series
series = pd.Series([10, 20, 30, 40])

# Modifying a specific value
series.at[1] = 25
print(series)

```



If you want to append new values to series:
```python
# Appending a new value
series = series.append(pd.Series([50]))
print(series)
```


2. **Modifying DataFrame:**

In case of DataFrame to modify certain value, need to provide index of it
```python
import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}

df = pd.DataFrame(data)

# Modifying a specific value
df.at[1, 'Age'] = 31

print(df)
```

In case of adding a new column:

```python
# Adding a new column
df['Salary'] = [50000, 60000, 70000]

print(df)
```

Dropping Rows or Columns:
```python
# Dropping a specific row
df = df.drop(0)

# Dropping a specific column
df = df.drop('City', axis=1)

print(df)

```


