# Introduction to Pandas
### 1.1 Overview of Pandas

**What are Pandas?**
Pandas is an open-source data analysis and manipulation library for Python. It provides data structures and functions needed to work with structured data seamlessly. The primary data structures in Pandas are Series and DataFrame.

**Why use Pandas for data analysis?**
Pandas is widely used for data analysis due to its powerful capabilities, including:
- Easy handling of missing data.
- Data alignment and reshaping.
- High-performance merging and joining of datasets.
- Time series functionality.
- Intuitive data manipulation and analysis tools.

**Key Pandas components: Series and DataFrame**
- **Series**: A one-dimensional labeled array capable of holding any data type. It can be thought of as a column in a table.
- **DataFrame**: A two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table and is the primary data structure used in Pandas.

### 1.2 Setting Up Your Environment


**Setting up a Python environment**
You can set up a Python environment using Jupyter Notebook or Anaconda. Here’s how to do it:

- **Jupyter Notebook**: Install Jupyter using pip:
  


- **Anaconda**: Download and install Anaconda from the [official website](https://www.anaconda.com/products/distribution). Anaconda comes with Jupyter Notebook and many other useful packages pre-installed.

### 1.3 Pandas Basics

**Importing data from CSV, Excel, and JSON**
You can import data using the following functions:
- CSV: `pd.read_csv('file.csv')` [Female_Political](https://github.com/ddaeducation/data/blob/main/Female_Political_Representation2.csv)
- Excel: `pd.read_excel('file.xlsx')` [College_scorecard](https://github.com/ddaeducation/data/blob/main/college_scorecard_2021-22.xlsx)
- JSON: `pd.read_json('file.json')` [Iris_Example](https://github.com/ddaeducation/data/blob/main/iris.json)
- yfinance `yf.download(Ticker, start, end)` [Stock_data](https://finance.yahoo.com/quote/%5EGSPC/history/)

Data Source: [Mexico-real_estate_clean](https://github.com/ddaeducation/data/blob/main/mexico_real_estate_clean.csv)

In [24]:
import pandas as pd
from skimpy import skim

**Viewing data**
You can view data using:
- `df.head()`: Displays the first 5 rows of the DataFrame.
- `df.tail()`: Displays the last 5 rows of the DataFrame.
- `df.info()`: Provides a summary of the DataFrame, including the data types and non-null counts.
- `df.describe()`: Generates descriptive statistics for numerical columns.
- `skimpy(df)`: it is the summary of dataframe
- `df.columns`: Returns Column Names
- `df.dtypes` : Returns Data Types

**Data selection**
You can select data using:
- **Indexing and slicing**:
  - `df.loc[]`: Access a group of rows and columns by labels or a boolean array.
     - `df.loc[["A", "B"], ["Name", "Age"]] (specific rows and columns`
     - `df.loc[:, "City"])(all rows with sepecifi columns)`
  - `df.iloc[]`: Access a group of rows and columns by integer position.
     - `df.iloc[0:2, 0:2], Select the first two rows and the first two columns`
     - `df.iloc[:, 1],Select all rows and the second column`

### 1.4 Hands-on Practice
#### Objective:
The goal of this assignment is to familiarize yourself with the Pandas library by loading and exploring the Titanic dataset. You will perform various data exploration tasks to understand the structure and characteristics of the dataset.

#### Instructions:

1. **Setup**:
   - Ensure you have Python and Jupyter Notebook installed on your machine.
   - Install the Pandas library if you haven't already by running `!pip install pandas` in a Jupyter Notebook cell.

2. **Load the Dataset**:
   - Download the Titanic dataset from [ddaeducation](https://github.com/ddaeducation/data/blob/main/titanic.csv) and [Description](https://www.kaggle.com/competitions/titanic/data)or any other source and save it as `titanic.csv`.
   - Use the following code to load the dataset into a Pandas DataFrame:
     import pandas as pd
     `df = pd.read_csv('titanic.csv')`

3. **Data Exploration**:
   - Perform the following tasks and document your findings:
     1. Display the first 10 rows of the dataset using `df.head(10)`.
     2. Display the last 10 rows of the dataset using `df.tail(10)`.
     3. Get a summary of the dataset using `df.info()`.
     4. Generate descriptive statistics for numerical columns using `df.describe()`.
     5. List all the column names in the dataset using `df.columns`.
     6. Check the data types of each column using `df.dtypes`.

4. **Data Selection (After Second Session)**:
   - Select and display the following subsets of data:
     1. All passengers who survived (use boolean indexing).
     2. The first 5 rows of the 'Age' and 'Fare' columns using `df.loc[]`.
     3. The rows corresponding to the first class passengers using `df[df['Pclass'] == 1]`.

5. **Analysis Questions** (**After Second Session**):
   - Answer the following questions based on your exploration:
     1. How many passengers were on board the Titanic?
     2. What was the average age of the passengers?
     3. How many passengers survived, and what percentage of the total does this represent?
     4. What was the average fare paid by passengers?

6. **Submission**:
   - Save your Jupyter Notebook and submit it as a `.ipynb` file. Ensure that your code is well-commented and your findings are clearly documented and share with me

#### Evaluation Criteria:
- Correctness of the code and outputs.
- Clarity and organization of the notebook.
- Completeness of the analysis questions.
- Proper use of Pandas functions and methods.

### Good Luck!
This assignment will help you gain hands-on experience with data manipulation and analysis using Pandas.