### Data Analysis Process

1. Asking Questions
2. Dara Wrangling
3. Exploratory Data Analysis
4. Drawing Conclusions
5. Communicating Results

### 1. Asking Questions
The first step in the data analysis process is to ask clear and focused questions. This involves identifying the problem you want to solve or the hypothesis you want to test. Good questions should be specific, measurable, and relevant to the data at hand.

    1. What features will contribute to my analysis?
    2. What features are not important for my analysis?
    3. which of the features have a strong correlation with the target variable?
    4. Do I need data processing?
    5. What kind of feature manipulation/engaging is require?

### Data Wrangling/Munging/Cleaning
Data wrangling, also known as data cleaning, is the process of transforming and mapping raw data into a more usable format. This step is crucial for ensuring the quality and reliability of the data before analysis.

    1. Handling missing values
    2. Removing duplicates
    3. Correcting inconsistencies
    4. Normalizing or scaling data
    5. Encoding categorical variables

* Gathering Data 
    - CSV files, Databases, APIs
* Assessing Data 
    - Understanding data quality, identifying missing values, and detecting outliers
* Cleaning Data 
    - Handling missing values, correcting inconsistencies, and removing duplicates

### Exporatory Data Analysis (EDA)
Exploratory Data Analysis (EDA) is the process of analyzing data sets to summarize their main characteristics, often using visual methods. EDA helps to uncover patterns, spot anomalies, test hypotheses, and check assumptions.

    1. Descriptive statistics
    2. Data visualization
    3. Identifying patterns and trends
    4. Correlation analysis
    5. Outlier detection

### Drawing Conclusions
Drawing conclusions involves interpreting the results of your analysis to answer the original questions posed. This step requires critical thinking and the ability to connect findings back to the broader context of the problem.

    1. Summarizing key findings
    2. Relating results to the original questions
    3. Considering alternative explanations
    4. Acknowledging limitations of the analysis
    5. Suggesting next steps or further research

* Machine Learing
    - Supervised Learning
    - Unsupervised Learning
    - Model Evaluation
* Inferential Statistics
    - Hypothesis Testing
    - Confidence Intervals
* Descriptive Statistics
    - Mean, Median, Mode
    - Standard Deviation, Variance

### Communicating Results/ Data Storytelling
Communicating results effectively is essential for ensuring that stakeholders understand the insights derived from the data analysis. This step involves presenting findings in a clear and compelling manner.

    1. Creating visualizations (charts, graphs, PPTs)
    2. Writing reports or summaries
    3. Presenting findings to stakeholders
    4. Using storytelling techniques to engage the audience
    5. Providing actionable recommendations based on the analysis

### Importing Data

* Method 1: Using pandas

    ```python
    import pandas as pd
    data = pd.read_csv('data.csv')
    ```
* Method 2: From an URL

    ```python
    import requests
    from io import StringIO

    url = "https://example.com/data.csv"
    headers = {"User-Agent": "Mozilla/5.0"}
    req = requests.get(url, headers=headers)
    data = StringIO(req.text)

    import pandas as pd
    df = pd.read_csv(data)
    ```
* Method 3: Using SQL

    ```python
    import sqlite3
    conn = sqlite3.connect('database.db')
    query = "SELECT * FROM table_name"
    data = pd.read_sql_query(query, conn)
    ```
* Method 4: Using Excel

    ```python
    import pandas as pd
    data = pd.read_excel('data.xlsx')
    ```
* Method 5: Using APIs

    ```python   
    import requests
    response = requests.get('https://api.example.com/data')
    data = response.json()
    ```


### Working with CSV files (CSV Comma Separated Values) usinf pandas

* Reading a CSV file

    ```python
    import pandas as pd
    df = pd.read_csv('file.csv')
    ```
* Sep Parameter

    ```python
    df = pd.read_csv('file.csv', sep=';')  # for semicolon-separated values

    df = pd.read_csv('file.tsv', sep='\t')  # for tab-separated values

    ```
* Indec_col Parameter

    ```python
    df = pd.read_csv('file.csv', index_col='column Name')  # used for replacing the default index with a specific column name
    ```
* Header Parameter

    ```python
    df = pd.read_csv('file.csv', header=0)  # used to specify the row number to use as the column names
    ```
* use_cols Parameter

    ```python
    df = pd.read_csv('file.csv', usecols=['col1', 'col2'])  # used to select specific columns to read from the CSV file
    ```
* Squeeze Parameter

    ```python
    df = pd.read_csv('file.csv',usecols=['col1'], squeeze=True)  # used to convert a specific-column DataFrame into a Series
    ```
* Skiprows/nrows Parameter

    ```python
    df = pd.read_csv('file.csv', skiprows=[0,2])  # used to skip the first n rows of the CSV file

    df = pd.read_csv('file.csv', nrows=100)  # used to read only the first n rows of the CSV file
    ```
* Encoding Parameter

    ```python
    df = pd.read_csv('file.csv', encoding='utf-8')  # used to specify the encoding of the CSV file
    ```
* Skip badlines Parameter

    ```python
    df = pd.read_csv('file.csv', error_bad_lines=False)  # used to skip lines with too many fields (deprecated, use on_bad_lines instead)
    ```
* Loading huge CSV files in chunks

    ```python
    chunk_size = 10000  # number of rows per chunk
    chunks = pd.read_csv('large_file.csv', chunksize=chunk_size)

    for chunk in chunks:
        # process each chunk
        print(chunk.head())
    ```

### Importing Excel files using pandas

```python
import pandas as pd
df = pd.read_excel('file.xlsx', sheet_name='Sheet1')  # specify the sheet name
```

### Importing txt files using pandas

```python
import pandas as pd
df = pd.read_csv('file.txt', sep='\t')  # for tab-separated values
```

### Importing JSON files using pandas

```python
import pandas as pd
df = pd.read_json('file.json') 
```

### Imporinting JSON files from URL

```python
import pandas as pd
url = 'https://api.example.com/data.json'
df = pd.read_json(url)
```


### Imporinting SQL files using pandas

```python
import pandas as pd
import sqlite3
conn = sqlite3.connect('database.db')
query = "SELECT * FROM table_name"
df = pd.read_sql_query(query, conn)
```

### Exporting Data as CSV using pandas

```python   
import pandas as pd
df.to_csv('output.csv', index=False)  # index=False to avoid writing row indices
```

### Exporting Data as Excel using pandas

```python
import pandas as pd
df.to_excel('output.xlsx', index=False)  # index=False to avoid writing row indices
```

### Exporting Data as JSON using pandas

```python
import pandas as pd
df.to_json('output.json', orient='records', lines=True)
```

### Exporting Data to SQL using pandas

```python   
import pandas as pd
import sqlite3
conn = sqlite3.connect('database.db')
df.to_sql('table_name', conn, if_exists='replace', index=False)
```
