# Data Analyst Crash Course - Notes (First Hour)

## Course Introduction

- **What is Python?**
  - Created by **Guido van Rossum** in 1991.
  - Named after the TV show *Monty Python*.
  - Multi-purpose programming language used in **web development, AI, machine learning, data analysis, automation**.
  - Commonly used in **data analytics** for **data collection, cleaning, analysis, and visualization**.
  - Integrated into **Excel, SQL, Power BI, and Tableau**.
  - Open-source and supported by big tech companies.

## Running Python

### Two ways to run Python:
1. **Locally (on your computer)**
   - Install Python & Jupyter Notebook.
   - Write code in `.py` files or Notebooks.
   - Run via terminal or command prompt.
2. **Cloud (Google Colab - Recommended for beginners)**
   - Runs in the browser, no installation needed.
   - Provides free access to GPUs for faster computing.
   - Similar to Jupyter Notebook.

> For this course, we'll use **Google Colab** for the Basics section and **install Python locally** in the Advanced section.

## Getting Started with Google Colab

- **Opening Google Colab**:
  - Go to [Google Colab](https://colab.research.google.com/)
  - Click **New Notebook**.
  - Rename the notebook (e.g., `Getting_Started`).
  - Code is written in **code cells**.
  - Comments are written in **text cells (Markdown)**.

- **Running Code Cells**:
  - Press `Shift + Enter` to run a cell and move to the next one.
  - Press `Ctrl + Enter` (or `Cmd + Enter` on Mac) to run without moving to the next cell.

- **Adding Comments**:
  - Use `#` for comments in code.
  ```python
  # This is a comment
  print("Hello, Data Nerds!")  # Comment after a statement
  ```

- **Using Markdown in Text Cells**:
  ```markdown
  # Heading 1
  ## Heading 2
  **Bold text**
  *Italic text*
  - Bullet List Item
  1. Numbered List Item
  ```

## Variables in Python

- A **variable** stores a value in memory.
- Assigned using the `=` operator.

```python
salary = 100000  # Integer
job_title = "Data Analyst"  # String
remote_work = True  # Boolean
```

- **Variable Naming Rules**:
  - Must start with a **letter or underscore** (`_`).
  - Cannot start with a number.
  - Cannot contain spaces (use `_` instead).
  - Case-sensitive (`Job_Title` ≠ `job_title`).

- **Checking Variable Types**:
  ```python
  print(type(salary))  # Output: <class 'int'>
  print(type(job_title))  # Output: <class 'str'>
  print(type(remote_work))  # Output: <class 'bool'>
  ```

## Python Terms & Concepts

- **Everything in Python is an Object** (including numbers, strings, functions).
- Objects belong to a **class** (e.g., integers belong to `int` class).

```python
print(type(100000))  # Output: <class 'int'>
print(type("Data Analyst"))  # Output: <class 'str'>
```

- **Functions:** Built-in operations that manipulate data.
  - Example: `print()`, `len()`, `type()`
  ```python
  print("Hello")  # Prints Hello
  print(len("Data"))  # Outputs: 4
  ```

- **Attributes:** Variables inside an object.
- **Methods:** Functions inside an object.

```python
job = "data analyst"
print(job.upper())  # Method: Converts to uppercase
print(job.replace("data", "marketing"))  # Replace words
```

## Data Types in Python

| Data Type   | Example |
|------------|---------|
| Integer (`int`) | `100000` |
| Float (`float`) | `99.99` |
| String (`str`) | `'Data Analyst'` |
| Boolean (`bool`) | `True / False` |
| List (`list`) | `["Python", "SQL"]` |
| Tuple (`tuple`) | `("Data", "Analytics")` |
| Dictionary (`dict`) | `{"skill": "Python"}` |

```python
# Examples of data types
age = 25  # Integer
salary = 85000.50  # Float
name = "Alice"  # String
is_remote = True  # Boolean
skills = ["Python", "SQL", "Excel"]  # List
job_info = {"title": "Data Analyst", "company": "TechCorp"}  # Dictionary
```

## String Manipulation

- **Convert String to Upper/Lower Case**:
  ```python
  skill = "python"
  print(skill.upper())  # Output: 'PYTHON'
  print(skill.lower())  # Output: 'python'
  ```

- **Replace Text in a String**:
  ```python
  job_title = "Data Analyst"
  new_title = job_title.replace("Data", "Business")
  print(new_title)  # Output: 'Business Analyst'
  ```

- **Split a String into a List**:
  ```python
  text = "Data, Python, SQL"
  skills_list = text.split(", ")
  print(skills_list)  # Output: ['Data', 'Python', 'SQL']
  ```

---

## Summary of First Hour

✅ **Learned about Python & its uses in data analysis**  
✅ **Set up Google Colab & wrote first Python code**  
✅ **Explored variables, data types, and basic operations**  
✅ **Practiced string manipulation with built-in methods**  

---

### Next Topics
In the upcoming section, we’ll cover:
- More string operations
- Operators in Python
- Conditional Statements

🚀 **Keep coding!** Practice by writing your own examples in Google Colab. See you in the next session! 🎯



# Data Analyst Crash Course - Notes (Second Hour)

## String Methods & Operators

### String Concatenation

- Strings can be combined using the `+` operator.
  ```python
  first_name = "Data"
  last_name = "Analyst"
  full_name = first_name + " " + last_name
  print(full_name)  # Output: Data Analyst
  ```
- The `*` operator repeats a string multiple times.
  ```python
  text = "Python " * 3
  print(text)  # Output: Python Python Python
  ```

### String Length & Methods

- `len()` function returns the length of a string.
  ```python
  name = "Data Analyst"
  print(len(name))  # Output: 12
  ```
- Other string methods:
  ```python
  text = "python programming"
  print(text.upper())  # Output: PYTHON PROGRAMMING
  print(text.replace("python", "data"))  # Output: data programming
  ```

---

## Using Chatbots for Learning & Debugging

- AI chatbots like **ChatGPT** can help explain concepts and debug errors.
- Example:
  - Asking: "What are methods in Python? Explain like I'm 5."
  - Chatbot explains methods as tools that act on data types (e.g., `.upper()`, `.replace()`).
- **Troubleshooting with Chatbots:**
  - Copy & paste an error message into ChatGPT.
  - The bot will explain the issue and suggest corrections.

---

## String Formatting

### 1. Concatenation (Using `+` Operator)

```python
role = "Data Analyst"
skill = "Python"
print("Role: " + role + " | Skill Required: " + skill)
```

### 2. `.format()` Method

```python
print("Role: {} | Skill Required: {}".format(role, skill))
```

### 3. **F-Strings (Recommended)**

```python
print(f"Role: {role} | Skill Required: {skill}")
```

### 4. Old-Style `%` Formatting (Not Recommended)

```python
print("Role: %s | Skill Required: %s" % (role, skill))
```

---

## Operators in Python

### 1. Arithmetic Operators

| Operator              | Example   | Output |
| --------------------- | --------- | ------ |
| `+` (Addition)        | `5 + 3`   | `8`    |
| `-` (Subtraction)     | `10 - 2`  | `8`    |
| `*` (Multiplication)  | `4 * 2`   | `8`    |
| `/` (Division)        | `10 / 3`  | `3.33` |
| `//` (Floor Division) | `10 // 3` | `3`    |
| `%` (Modulus)         | `10 % 3`  | `1`    |
| `**` (Exponent)       | `2 ** 3`  | `8`    |

### 2. Assignment Operators

- `x += 1` is the same as `x = x + 1`.
- `x -= 2` is the same as `x = x - 2`.

### 3. Comparison Operators

| Operator | Meaning                  | Example           |
| -------- | ------------------------ | ----------------- |
| `==`     | Equal to                 | `5 == 5` → `True` |
| `!=`     | Not equal to             | `5 != 3` → `True` |
| `>`      | Greater than             | `10 > 5` → `True` |
| `<`      | Less than                | `2 < 5` → `True`  |
| `>=`     | Greater than or equal to | `5 >= 5` → `True` |
| `<=`     | Less than or equal to    | `3 <= 4` → `True` |

---

## Conditional Statements (`if`, `elif`, `else`)

- **Syntax:**

  ```python
  age = 20
  if age >= 18:
      print("Eligible to vote")
  elif age == 17:
      print("Almost eligible")
  else:
      print("Not eligible")
  ```

- **Using ****\`\`**** keyword:** (for placeholders)

  ```python
  if age > 18:
      pass  # Will do something later
  ```

---

## Lists in Python

- **Lists store ordered items and allow duplicates.**
- **Defining a List:**
  ```python
  skills = ["Python", "SQL", "Excel"]
  ```

### List Methods

| Method     | Description              | Example                       |
| ---------- | ------------------------ | ----------------------------- |
| `append()` | Adds an item to the end  | `skills.append("Power BI")`   |
| `remove()` | Removes an item by value | `skills.remove("SQL")`        |
| `len()`    | Returns list length      | `len(skills)` → `3`           |
| `insert()` | Inserts at an index      | `skills.insert(1, "Tableau")` |
| `pop()`    | Removes item at index    | `skills.pop(2)`               |
| `index()`  | Finds item position      | `skills.index("Excel")` → `2` |

### List Indexing & Slicing

```python
# Accessing elements
print(skills[0])  # Python
print(skills[-1])  # Excel (last item)

# Slicing (start:end:step)
print(skills[:2])  # ['Python', 'SQL']
print(skills[::2])  # Every second item
```

### Unpacking Lists

```python
lang1, lang2, lang3 = skills
print(lang1)  # Python

skill_imp, *skill_not_imp = skills           #'*' takes rest of te elements
print(skill_imp)  #Python
print(skill_not_imp)  #['SQL','Excel']

list = [1, 2, 3, 4, 5]
_, second, *rest = list            #'_' ignores the element in which position it is placed at

print(second)  # Output: 2
print(rest)    # Output: [3, 4, 5]
```

---

## Dictionaries in Python

- **Dictionaries store key-value pairs.**
- **Defining a Dictionary:**
  ```python
  job_skills = {
      "Database": "PostgreSQL",
      "Language": "Python",
      "Library": "Pandas"
  }
  ```

### Dictionary Methods

| Method     | Description              | Example                                 |
| ---------- | ------------------------ | --------------------------------------- |
| `keys()`   | Returns keys             | `job_skills.keys()`                     |
| `values()` | Returns values           | `job_skills.values()`                   |
| `get()`    | Retrieves a value        | `job_skills.get("Language")` → `Python` |
| `pop()`    | Removes a key-value pair | `job_skills.pop("Library")`             |
| `update()` | Updates the dictionary   | `job_skills["Cloud"] = "AWS"`           |

### Accessing Dictionary Values

```python
print(job_skills["Database"])  # PostgreSQL
print(job_skills.get("Language"))  # Python
```

---

## Summary of Second Hour

✅ **Learned about string operators & methods**\
✅ **Explored different string formatting techniques**\
✅ **Understood Python operators and conditional statements**\
✅ **Covered lists, list slicing, and methods**\
✅ **Learned dictionary operations and usage**

---

### Next Topics

- Sets in Python
- Tuples in Python
- More advanced data structures

🚀 **Keep coding!** See you in the next session! 🎯



# Data Analyst Crash Course - Notes (First Three Hours)

## Sets in Python

- **Definition:** Unordered collections of unique items.
- **Defined using curly brackets `{}` instead of square brackets `[]`**.

```python
job_skills = {"Python", "SQL", "Excel", "Python"}  # Duplicates removed
print(job_skills)  # Output: {'Python', 'SQL', 'Excel'}
```

### Set Properties:
- **Unordered:** Items are not stored in any specific order.
- **No Indexing:** Cannot access elements using an index.
- **Unique Values:** Duplicates are automatically removed.

### Common Set Methods:
| Method | Description | Example |
|--------|-------------|---------|
| `add()` | Adds an element | `job_skills.add("Tableau")` |
| `remove()` | Removes an element (Error if missing) | `job_skills.remove("SQL")` |
| `discard()` | Removes an element (No error if missing) | `job_skills.discard("Power BI")` |
| `pop()` | Removes and returns an arbitrary element | `job_skills.pop()` |

```python
# Adding and removing elements
job_skills.add("Looker")
job_skills.remove("Python")
print(job_skills)
```

### Converting List to Set (Removing Duplicates)
```python
skills_list = ["Python", "SQL", "Python", "Excel"]
skills_set = set(skills_list)
print(skills_set)  # {'Python', 'SQL', 'Excel'}
```

---

## Tuples in Python

- **Definition:** Immutable ordered collection of items.
- **Defined using parentheses `()` instead of square brackets `[]`**.
- **Used when data should not be modified.**

```python
job_roles = ("Data Analyst", "Data Engineer", "ML Engineer")
```

### Tuple Properties:
- **Ordered:** Items have a defined order.
- **Immutable:** Cannot change elements after creation.
- **Supports Indexing:** Can access elements using an index.

```python
print(job_roles[0])  # Output: 'Data Analyst'
```

### Converting Tuple to List (For Modification)
```python
job_roles_list = list(job_roles)
job_roles_list.append("Data Scientist")
job_roles = tuple(job_roles_list)
print(job_roles)
```

---

## Operators (Part 2)

### Logical Operators
| Operator | Description | Example |
|----------|-------------|---------|
| `and` | Returns `True` if both conditions are true | `True and False → False` |
| `or` | Returns `True` if at least one condition is true | `True or False → True` |
| `not` | Reverses a boolean value | `not True → False` |

```python
remote = True
degree_required = False
print(remote and degree_required)  # Output: False
print(remote or degree_required)  # Output: True
```

### Membership Operators
| Operator | Description | Example |
|----------|-------------|---------|
| `in` | Checks if a value is in a sequence | `'SQL' in skills_list → True` |
| `not in` | Checks if a value is not in a sequence | `'R' not in skills_list → True` |

```python
print("Python" in job_skills)  # Output: True
print("R" not in job_skills)  # Output: True
```

### Identity Operators
- **`is` vs `==`**
  - `==` checks if values are equal.
  - `is` checks if objects have the same memory location.

```python
x = [1, 2, 3]
y = [1, 2, 3]
z = x
print(x == y)  # Output: True
print(x is y)  # Output: False
print(x is z)  # Output: True
```

---

## Loops in Python

### For Loop
- Iterates over a sequence (list, tuple, set, string, dictionary).
```python
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    print(num)
```

### While Loop
- Repeats as long as a condition is `True`.
```python
x = 1
while x <= 5:
    print(x)
    x += 1
```

---

## List Comprehensions
- **Compact way to create lists using loops.**
```python
squares = [x**2 for x in range(10)]
print(squares)  # Output: [0, 1, 4, 9, 16, ..., 81]
```

- **With conditions:**
```python
even_numbers = [x for x in range(10) if x % 2 == 0]
print(even_numbers)  # Output: [0, 2, 4, 6, 8]
```

---

## Exercise: Basics
- **Goal:** Filter job roles based on required skills.
```python
my_skills = ["Python", "SQL", "Excel"]
job_roles = [
    {"title": "Data Analyst", "skills": ["Python", "SQL", "Excel"]},
    {"title": "BI Analyst", "skills": ["Python", "SQL", "Tableau"]}
]
qualified_roles = [job["title"] for job in job_roles if all(skill in job["skills"] for skill in my_skills)]
print(qualified_roles)
```

---

✅ **Covered Sets, Tuples, Logical Operators, Membership Operators, Loops, and List Comprehensions!**

🚀 **Next Up: Functions & Modules!**



# Data Analyst Crash Course - Notes (First Four Hours)

## Functions in Python

### Defining Functions
- Functions help **reuse code** and improve readability.
- Defined using `def` keyword.

```python
def greet(name):
    return f"Hello, {name}!"

print(greet("Data Analyst"))  # Output: Hello, Data Analyst!
```

### Function Arguments
- **Positional Arguments**: Passed in order.
- **Keyword Arguments**: Passed with parameter names.
- **Default Arguments**: Assign default values.

```python
def calculate_salary(base_salary, bonus_rate=0.1):
    return base_salary * (1 + bonus_rate)

print(calculate_salary(100000))  # Uses default 10% bonus
print(calculate_salary(100000, 0.2))  # Custom bonus rate
```

### Returning Multiple Values
```python
def salary_details(salary):
    bonus = salary * 0.1
    total = salary + bonus
    return bonus, total

bonus, total_salary = salary_details(100000)
print(bonus, total_salary)  # Output: 10000, 110000
```

---

## Lambda Functions (Anonymous Functions)
- **One-line functions without a name**.
- Used for **short operations inside built-in functions**.

```python
double = lambda x: x * 2
print(double(5))  # Output: 10
```

- **Multiple Arguments**:
```python
add = lambda x, y: x + y
print(add(3, 4))  # Output: 7
```

- **Using `lambda` in `map()`**:
```python
numbers = [1, 2, 3, 4]
doubled = list(map(lambda x: x * 2, numbers))
print(doubled)  # Output: [2, 4, 6, 8]
```

- **Using `lambda` in `filter()`**:
```python
skills = ["Python", "SQL", "Excel", "Power BI"]
data_skills = list(filter(lambda skill: "Python" in skill, skills))
print(data_skills)  # Output: ['Python']
```

---

## Modules in Python
- **Modules** are Python files containing **functions, variables, and classes**.
- **Importing a Module**:
```python
import math
print(math.sqrt(16))  # Output: 4.0
```

- **Creating a Custom Module**:
  1. Create a file `my_module.py`.
  2. Define functions inside it.
  3. Import and use it.

`my_module.py`:
```python
def greet(name):
    return f"Hello, {name}!"
```

Using the module:
```python
import my_module
print(my_module.greet("Data Analyst"))
```

- **Importing Specific Functions**:
```python
from my_module import greet
print(greet("Data Science"))
```

- **Importing Everything (`*` Not Recommended)**:
```python
from my_module import *
```

---

## Python Libraries
- **Libraries** contain multiple modules.
- Common **Data Analysis Libraries**:
  - `pandas` - Data handling.
  - `matplotlib` - Data visualization.
  - `numpy` - Numerical computing.

- **Installing External Libraries**:
```python
!pip install pandas
```

- **Checking Installed Libraries**:
```python
!pip list
```

- **Importing a Library**:
```python
import pandas as pd
```

---

## Exercise: Python Library
- **Task**: Convert job data from strings to correct types.
- **Example Dataset**:
```python
jobs = [
    {"title": "Data Analyst", "skills": "['Python', 'SQL']", "date_posted": "2024-01-15"},
]
```

- **Convert Date from String to Datetime**:
```python
from datetime import datetime

for job in jobs:
    job["date_posted"] = datetime.strptime(job["date_posted"], "%Y-%m-%d")
```

- **Convert String List to Actual List**:
```python
import ast

for job in jobs:
    job["skills"] = ast.literal_eval(job["skills"])
```

- **Final Output**:
```python
print(jobs)
# [{'title': 'Data Analyst', 'skills': ['Python', 'SQL'], 'date_posted': datetime.datetime(2024, 1, 15, 0, 0)}]
```

---

✅ **Covered Functions, Lambda, Modules, and Python Libraries!**

🚀 **Next Up: Object-Oriented Programming (OOP) & Data Analysis with Pandas!**



# Data Analyst Crash Course - Notes (First Five Hours)

## Object-Oriented Programming (OOP) in Python

### Introduction to Classes and Objects
- **Classes** are blueprints for creating objects.
- **Objects** are instances of classes.
- **Attributes** store object properties.
- **Methods** define object behavior.

```python
class Employee:
    def __init__(self, name, salary):
        self.name = name  # Attribute
        self.salary = salary  # Attribute
    
    def show_salary(self):  # Method
        return f"{self.name}'s salary is {self.salary}"

emp1 = Employee("Alice", 100000)
print(emp1.show_salary())  # Output: Alice's salary is 100000
```

### `self` Keyword
- Represents the instance of the class.
- Used to access attributes and methods.

---

### Magic Methods (`__init__`, `__repr__`)

- **`__init__`**: Constructor method that initializes attributes.
- **`__repr__`**: Defines how an object is represented when printed.

```python
class Salary:
    def __init__(self, base_salary, bonus_rate=0.1, symbol="$"):
        self.base_salary = base_salary
        self.bonus_rate = bonus_rate
        self.symbol = symbol
        self.total_salary = base_salary * (1 + bonus_rate)
    
    def __repr__(self):
        return f"{self.symbol}{self.total_salary:,.2f}"

sal = Salary(100000)
print(sal)  # Output: $110,000.00
```

- **Formatted Output with Comma Separators:**
  ```python
  num = 1000000
  print(f"{num:,}")  # Output: 1,000,000
  ```

---

### Inheritance (Extending Classes)
- Allows new classes to inherit attributes and methods from existing classes.

```python
class Manager(Employee):
    def __init__(self, name, salary, department):
        super().__init__(name, salary)
        self.department = department
    
    def show_department(self):
        return f"{self.name} manages {self.department} department."

mgr = Manager("Bob", 120000, "Data Science")
print(mgr.show_department())  # Output: Bob manages Data Science department.
```

---

## NumPy: Introduction

### Why Use NumPy?
- Faster than Python lists for numerical computing.
- Supports large datasets efficiently.
- Backbone of Pandas and other libraries.

### Installing NumPy
```python
!pip install numpy
```

### Creating NumPy Arrays
```python
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr)  # Output: [1 2 3 4]
```

### Performance Comparison: NumPy vs. Python Lists
```python
import random
import time

# Generate a list of 1 million random salaries
salaries = [random.randint(50000, 150000) for _ in range(1000000)]
start_time = time.time()
mean_salary = sum(salaries) / len(salaries)
print("List Mean:", mean_salary, "Time Taken:", time.time() - start_time)

# Using NumPy
np_salaries = np.array(salaries)
start_time = time.time()
mean_salary = np.mean(np_salaries)
print("NumPy Mean:", mean_salary, "Time Taken:", time.time() - start_time)
```

### Handling Missing Values with `np.nan`
```python
salaries_with_nan = np.array([60000, 80000, np.nan, 90000])
print(np.nanmean(salaries_with_nan))  # Output: Mean ignoring NaN values
```

---

## NumPy Operations

### Basic Math Operations
```python
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(arr1 + arr2)  # Output: [5 7 9]
print(arr1 * 2)  # Output: [2 4 6]
```

### Aggregation Functions
```python
print(np.mean(arr1))  # Output: Mean of array
print(np.median(arr1))  # Output: Median of array
print(np.std(arr1))  # Output: Standard deviation
```

---

✅ **Covered Object-Oriented Programming, Magic Methods, and NumPy Basics!**

🚀 **Next Up: Pandas for Data Analysis!**



# Data Analyst Crash Course - Notes (First Six Hours)

## Pandas: Data Inspection & Cleaning

### Understanding DataFrames
- A **DataFrame** is a table-like structure with rows and columns, similar to an Excel spreadsheet.
- Each column has a **data type** that determines what kind of values it holds.

```python
import pandas as pd
# Load sample dataset
df = pd.read_csv("jobs.csv")

# Display first 5 rows
df.head()
```

### Handling Missing Values
- **Checking for Missing Data:**
  ```python
  print(df.isnull().sum())  # Count missing values per column
  ```
- **Removing Rows with Missing Data:**
  ```python
  df.dropna(inplace=True)  # Removes rows with NaN values
  ```
- **Filling Missing Values:**
  ```python
  df.fillna({"salary": df["salary"].median()}, inplace=True)  # Fill NaN with median salary
  ```

---

## Pandas: Grouping & Aggregation

### Using `.groupby()` for Aggregation
- **Grouping by Job Title and Getting Median Salary:**
  ```python
  df.groupby("job_title")["salary"].median()
  ```
- **Grouping by Multiple Columns:**
  ```python
  df.groupby(["job_title", "country"])["salary"].median()
  ```
- **Aggregating Multiple Statistics:**
  ```python
  df.groupby("job_title")["salary"].agg(["min", "max", "median"])
  ```

### Sorting Grouped Data
```python
df.groupby("job_title")["salary"].median().sort_values(ascending=False)
```

---

## Matplotlib: Introduction & Plotting

### Installing Matplotlib
```python
!pip install matplotlib
```

### Importing Matplotlib
```python
import matplotlib.pyplot as plt
```

### Basic Line Chart
```python
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y)
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.title("Simple Line Chart")
plt.show()
```

### Bar Charts with Pandas
```python
df["job_title"].value_counts().plot(kind="bar")
plt.title("Job Title Counts")
plt.show()
```

### Sorting & Formatting Bar Charts
```python
df["job_title"].value_counts().sort_values().plot(kind="barh")
plt.title("Job Title Counts (Sorted)")
plt.xlabel("Count")
plt.show()
```

---

✅ **Covered Pandas Data Inspection, Grouping, and Matplotlib Basics!**

🚀 **Next Up: Advanced Visualizations with Matplotlib!**



# Data Analyst Crash Course - Notes (First Seven Hours)

## Setting Up VS Code for Python Development

### Why Use VS Code?
- Most popular code editor, free, and widely used.
- Lightweight, fast, and supports multiple languages.
- Highly customizable with extensions.

### Installing VS Code
1. **Download & Install:**
   - Go to [VS Code website](https://code.visualstudio.com/)
   - Download for Windows/Mac/Linux.
   - Follow installation prompts.
2. **Install Python Extension:**
   - Open VS Code.
   - Go to **Extensions** (Ctrl+Shift+X / Cmd+Shift+X on Mac).
   - Search for **Python** and install the Microsoft extension.

### Connecting Python to VS Code
1. Open **Command Palette** (`Ctrl+Shift+P` / `Cmd+Shift+P` on Mac).
2. Search for **Python: Select Interpreter**.
3. Choose **Anaconda/Virtual Environment/Python Installed Version**.
4. Create a Python file (`test.py`) and write:
   ```python
   print("Hello, Data Analyst!")
   ```
5. Save and **Run File** from the top right.

---

## Virtual Environments in Python

### What is a Virtual Environment?
- An isolated space for projects with **separate dependencies**.
- Avoids conflicts between projects.

### Creating a Virtual Environment
- **Using Conda (Recommended for Data Science)**
  ```sh
  conda create --name data_env python=3.11
  ```
- **Using venv (Default Python Method)**
  ```sh
  python -m venv data_env
  ```

### Activating Virtual Environment
- **Conda**:
  ```sh
  conda activate data_env
  ```
- **venv (Windows)**:
  ```sh
  data_env\Scripts\activate
  ```
- **venv (Mac/Linux)**:
  ```sh
  source data_env/bin/activate
  ```

### Installing Libraries in Virtual Environment
```sh
pip install pandas numpy matplotlib
```

### Deactivating & Removing Virtual Environment
- **Deactivate:** `conda deactivate`
- **Remove:** `conda remove --name data_env --all`

---

## Managing Python Packages

### Checking Installed Packages
```sh
pip list
conda list
```

### Installing Specific Package Versions
```sh
pip install numpy==1.21.0
```

### Uninstalling a Package
```sh
pip uninstall pandas
```

---

## Jupyter Notebooks in VS Code

### Installing Jupyter Extension
- Go to **Extensions** (`Ctrl+Shift+X` / `Cmd+Shift+X` on Mac).
- Search for **Jupyter** and install.

### Creating a Jupyter Notebook
1. Open VS Code.
2. Click **New File** → Name it **test.ipynb**.
3. Open the file → Select **Python Kernel** (same as Virtual Environment).
4. Create & run code cells.
   ```python
   print("Hello, Jupyter!")
   ```

### Converting Between Python & Jupyter
- Convert `.py` to `.ipynb` (Jupyter Notebook):
  ```sh
  jupyter nbconvert --to notebook script.py
  ```
- Convert `.ipynb` to `.py`:
  ```sh
  jupyter nbconvert --to script notebook.ipynb
  ```

---

## Pandas: Accessing Data Efficiently

### `.loc[]` vs `.iloc[]` for Selecting Data
| Method | Usage | Example |
|--------|-------|---------|
| `.iloc[]` | Index-based selection | `df.iloc[0, 2]` (First row, third column) |
| `.loc[]` | Label-based selection | `df.loc[0, "salary"]` (Row index 0, column "salary") |

```python
# Example DataFrame
import pandas as pd

data = {"Job": ["Data Analyst", "Data Scientist"], "Salary": [70000, 100000]}
df = pd.DataFrame(data)
print(df.loc[0, "Job"])  # Output: Data Analyst
```

### Filtering Data Using `.loc[]`
```python
filtered_df = df.loc[df["Salary"] > 75000]
print(filtered_df)
```

### Handling Missing Data
```python
df.fillna("Unknown", inplace=True)  # Fill NaN values with "Unknown"
```

---

✅ **Covered VS Code Setup, Virtual Environments, Jupyter Notebooks, and Advanced Pandas Operations!**

🚀 **Next Up: Advanced Data Cleaning and Pandas Data Management!**



# Data Analyst Crash Course - Notes (First Nine Hours)

## Pandas: Index Management

### Resetting Index
- **Resets the index to default numerical values.**
- Example:
  ```python
  df.reset_index(inplace=True)  # Resets index to default integer values
  ```
- **Removing the old index:**
  ```python
  df.reset_index(drop=True, inplace=True)  # Drops the old index permanently
  ```

### Setting an Index
- **Makes a specific column the index.**
  ```python
  df.set_index('Job Index', inplace=True)  # Sets 'Job Index' column as index
  ```

### Sorting by Index
- **Sorts values based on the index.**
  ```python
  df.sort_index(inplace=True)  # Sorts DataFrame by index values
  ```

---

## Exercise: Job Demand Analysis

### **Goal:**
- Analyze job demand by grouping jobs by month.

### **Steps:**
1. Filter jobs by country.
2. Extract the month from the date column.
3. Convert the month to a readable format (`January`, `February`, etc.).
4. Pivot data with months as rows, job titles as columns, and count occurrences.
5. Sort months chronologically.
6. Plot the results.

### **Example:**
```python
# Extract month name from date column
df['Month'] = df['Job Posted Date'].dt.strftime('%B')
# Create a pivot table to count job postings by month and job title
df_pivot = df.pivot_table(index='Month', columns='Job Title', aggfunc='size')
# Sort months in chronological order
df_pivot.sort_index(inplace=True)
```

---

## Pandas: Merging & Concatenating DataFrames

### Merging DataFrames
- **Combines datasets using a common column.**
- Example:
  ```python
  merged_df = df_jobs.merge(df_companies, on='Company Name')  # Merges two DataFrames based on 'Company Name'
  ```

### Concatenating DataFrames
- **Stacks rows together from multiple DataFrames.**
- Example:
  ```python
  df_combined = pd.concat([df_jan, df_feb, df_mar], ignore_index=True)  # Stacks rows together, resets index
  ```

---

## Pandas: Exporting Data

### Saving to CSV
```python
df.to_csv('data.csv', index=False)  # Saves DataFrame to CSV file without index
```

### Saving to Excel
```python
df.to_excel('data.xlsx', index=False)  # Saves DataFrame to Excel file
```
- Ensure `openpyxl` is installed for Excel support:
  ```sh
  pip install openpyxl
  ```

---

## Pandas: Applying Functions

### Using `.apply()`
- **Transforms data using a function.**
- Example: Adjust salaries for inflation (+3% increase):
  ```python
  df['Projected Salary'] = df['Salary'].apply(lambda x: x * 1.03)  # Increases salary by 3%
  ```

---

## Pandas: Exploding Lists

### **What is `.explode()`?**
- **Converts list-type column values into individual rows.**
- Example:
  ```python
  df_exploded = df.explode('Skills')  # Expands list-type values into separate rows
  ```

---

## Matplotlib: Formatting Charts

### Creating Subplots for Comparison
```python
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 2)  # Creates 1 row, 2 columns of subplots
df1.plot(kind='bar', ax=ax[0])  # Plots first DataFrame in first subplot
df2.plot(kind='bar', ax=ax[1])  # Plots second DataFrame in second subplot
plt.show()
```

---

## Matplotlib: Pie Charts

### Best Use Cases
- **Suitable for two or three categories.**

### **Example: Work-from-home status visualization**
```python
df['Work From Home'].value_counts().plot(kind='pie', autopct='%1.1f%%', startangle=90)  # Pie chart with percentage labels
plt.show()
```

---

## Matplotlib: Scatter Plots (Showing Correlation)

### **Used to compare variables (e.g., Salary vs. Skill Demand).**
```python
df.plot(kind='scatter', x='Skill Count', y='Median Salary')  # Plots skill count vs salary as scatter plot
plt.show()
```

### Enhancing Scatter Plots
- **Add labels using `plt.text()`**:
  ```python
  for i, txt in enumerate(df['Skill Name']):
      plt.text(df['Skill Count'][i], df['Median Salary'][i], txt)  # Adds labels to each point in the scatter plot
  ```

---

## Matplotlib: Advanced Customization

### **Changing Line Styles and Colors**
```python
df.plot(kind='line', linewidth=3, linestyle='--', color='red')  # Dashed red line with thickness 3
plt.show()
```

### **Adjusting Figure Size**
```python
plt.figure(figsize=(10, 5))  # Sets figure size to 10x5 inches
```

### **Using `adjustText` for Avoiding Label Overlaps**
```python
from adjustText import adjust_text
texts = [plt.text(df['Skill Count'][i], df['Median Salary'][i], txt) for i, txt in enumerate(df['Skill Name'])]
adjust_text(texts)  # Adjusts text positions to prevent overlapping
```

---

✅ **Covered Pandas Index Management, Merging, Exporting, Applying Functions, and Advanced Matplotlib Customization!**

🚀 **Next Up: Seaborn for Advanced Visualizations!**



# Data Analyst Crash Course - Notes (First Ten Hours)

## Pandas: Exploding Lists & Skill Analysis

### Breaking Down Lists into Rows
- **`.explode()` converts list-type column values into individual rows.**
- Example:
  ```python
  df_exploded = df.explode('Job Skills')  # Expands list-type values into separate rows
  ```
- This makes it easier to analyze job skills individually.

### Counting Skills in Job Listings
- **Use `.value_counts()` to count occurrences of each skill.**
- Example:
  ```python
  df_exploded['Job Skills'].value_counts()  # Counts frequency of each skill
  ```
- To visualize only the top 10 skills:
  ```python
  df_exploded['Job Skills'].value_counts().head(10).plot(kind='bar')  # Bar chart of top 10 skills
  ```

### Grouping Skills by Job Titles
- **Compare skill demand across different job titles.**
  ```python
  skill_count = df_exploded.groupby(['Job Title', 'Job Skills']).size().reset_index(name='Skill Count')
  ```
- This groups job titles and skills together, showing the count for each.

### Sorting Values for Visualization
- **Sort skills by frequency:**
  ```python
  skill_count.sort_values(by='Skill Count', ascending=False, inplace=True)  # Sorts in descending order
  ```

### Filtering for Specific Job Titles
- **Extract top 10 skills for Data Analysts:**
  ```python
  job_title = 'Data Analyst'
  top_skills = 10
  df_filtered = skill_count[skill_count['Job Title'] == job_title].head(top_skills)
  ```

### Plotting Top Skills for a Job Title
- **Using horizontal bar charts:**
  ```python
  df_filtered.plot(kind='barh', x='Job Skills', y='Skill Count')  # Horizontal bar chart
  ```

### Improving Readability in Charts
- **Invert Y-axis so the most in-demand skills appear at the top:**
  ```python
  plt.gca().invert_yaxis()
  ```
- **Add meaningful titles and labels:**
  ```python
  plt.title(f'Top {top_skills} Skills for {job_title}')
  plt.xlabel('Job Posting Count')
  plt.ylabel('')  # No need for redundant labels
  ```
- **Remove unnecessary legends:**
  ```python
  plt.legend().set_visible(False)
  ```

---

## Exercise: Trending Skills Analysis

### Tracking Skill Demand Over Time
- **Extracting month from job posting dates:**
  ```python
  df['Posted Month'] = df['Job Posted Date'].dt.month  # Converts date to numerical month
  ```
- **Pivoting data to track skill trends over time:**
  ```python
  df_pivot = df.pivot_table(index='Posted Month', columns='Job Skills', aggfunc='size', fill_value=0)
  ```
- **Sorting skills by total demand:**
  ```python
  df_pivot.loc['Total'] = df_pivot.sum()  # Adds row with total skill counts
  df_pivot = df_pivot.sort_values(by='Total', axis=1, ascending=False)  # Sorts skills by highest demand
  df_pivot.drop(index='Total', inplace=True)  # Removes total row after sorting
  ```

### Visualizing Skill Trends
- **Selecting the top 5 most in-demand skills:**
  ```python
  df_pivot.iloc[:, :5].plot(kind='line')  # Line chart of top 5 skills over time
  ```
- **Adding labels and formatting:**
  ```python
  plt.title('Top 5 Skills Demand Trend Over Time')
  plt.xlabel('Month')
  plt.ylabel('Skill Count')
  ```

---

✅ **Covered Pandas Skill Analysis, Trending Skill Demand, and Advanced Chart Formatting!**

🚀 **Next Up: Seaborn for Advanced Visualizations!**



# Data Analyst Crash Course - Notes (First Eleven Hours)

## Matplotlib: Formatting Axes & Customizing Plots

### Using Custom Formatters for Readability
- **Formatting salary values to show as `$100K` instead of long numbers.**
- Example:
  ```python
  import matplotlib.ticker as mticker
  ax = plt.gca()  # Get current axis
  ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, pos: f'${int(x/1000)}K'))
  ```
- This ensures that large salary numbers are easier to read.

### Adjusting Axis Limits
- **To focus on a meaningful range of values, adjust the x/y limits:**
  ```python
  plt.xlim(0, 250000)  # Limit salary range for better visualization
  ```
- Helps in cases where outliers skew the data.

---

## Matplotlib: Histograms

### Understanding Histograms
- **Histograms show the distribution of numerical values.**
- Example: Distribution of salary values.

### Creating a Histogram
- Example:
  ```python
  df['Salary'].plot(kind='hist', bins=30, edgecolor='black')
  ```
- **Bins**: The number of bars in the histogram. More bins = more granularity.
- **Edge Color**: Adds black borders to make bars more distinct.

### Customizing Histograms
- **Adjust bin size:**
  ```python
  df['Salary'].plot(kind='hist', bins=50, edgecolor='black')
  ```
- **Limit x-axis for clarity:**
  ```python
  plt.xlim(0, 250000)
  ```
- **Formatting salary labels as `$100K`:**
  ```python
  ax = plt.gca()
  ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, pos: f'${int(x/1000)}K'))
  ```

---

## Matplotlib: Box Plots

### Understanding Box Plots
- **Box plots show salary distribution & outliers.**
- **Median (Middle Line)**: The 50th percentile.
- **Interquartile Range (Box)**: 25th to 75th percentile.
- **Whiskers**: Approximate range of typical values.
- **Outliers (Dots)**: Unusually high/low salaries.

### Creating a Box Plot
- Example:
  ```python
  df.boxplot(column='Salary', vert=False)
  ```
- **Horizontal Layout** (`vert=False`) makes comparison easier.

### Comparing Job Roles with Box Plots
- Example:
  ```python
  df.boxplot(column='Salary', by='Job Title', vert=False)
  ```
- Helps compare salary distribution between Data Analysts, Data Engineers, and Data Scientists.

---

## Exercise: Skill Pay Analysis

### Goal:
- **Compare high-paying skills vs. most in-demand skills.**

### Step 1: Exploding Job Skills
- **Split skills into separate rows:**
  ```python
  df_exploded = df.explode('Job Skills')
  ```

### Step 2: Finding Top-Paying Skills
- **Group data by skill and calculate median salary:**
  ```python
  skill_salary = df_exploded.groupby('Job Skills')['Salary'].median().sort_values(ascending=False)
  ```
- **Display top 10:**
  ```python
  skill_salary.head(10)
  ```

### Step 3: Finding Most In-Demand Skills
- **Count how often each skill appears:**
  ```python
  skill_counts = df_exploded['Job Skills'].value_counts()
  ```
- **Display top 10:**
  ```python
  skill_counts.head(10)
  ```

### Step 4: Plotting the Results
- **Side-by-side bar charts for salary & demand:**
  ```python
  fig, ax = plt.subplots(2, 1)
  skill_salary.head(10).plot(kind='barh', ax=ax[0], title='Top-Paying Skills')
  skill_counts.head(10).plot(kind='barh', ax=ax[1], title='Most In-Demand Skills')
  plt.show()
  ```

---

## Seaborn: Introduction

### What is Seaborn?
- **Seaborn is a visualization library built on top of Matplotlib.**
- Provides better aesthetics & built-in themes.

### Installing & Importing Seaborn
- **Installation:**
  ```bash
  pip install seaborn
  ```
- **Importing:**
  ```python
  import seaborn as sns
  ```

### Seaborn vs. Matplotlib
- **Matplotlib**: Basic, requires customization.
- **Seaborn**: Pre-styled with built-in themes & color palettes.

### Example Comparison
```python
import seaborn as sns
sns.boxplot(x=df['Salary'])  # Easier than Matplotlib!
```

### Applying Seaborn Themes
- **Change appearance with `set_theme()`:**
  ```python
  sns.set_theme(style='darkgrid')
  ```
- **Other styles:** `'whitegrid'`, `'ticks'`, `'dark'`, `'white'`.

### Creating a Styled Box Plot
- Example:
  ```python
  sns.boxplot(x='Salary', y='Job Title', data=df)
  ```
- Adds color & improves readability over Matplotlib.

---

✅ **Covered Advanced Matplotlib Formatting, Skill Pay Analysis, and Seaborn Basics!**

🚀 **Next Up: Seaborn for Advanced Data Visualizations!**



# Data Analyst Crash Course - Notes (Complete Course)

## Matplotlib: Formatting Axes & Customizing Plots

### Using Custom Formatters for Readability
- **Formatting salary values to show as `$100K` instead of long numbers.**
- Example:
  ```python
  import matplotlib.ticker as mticker
  ax = plt.gca()  # Get current axis
  ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, pos: f'${int(x/1000)}K'))
  ```
- This ensures that large salary numbers are easier to read.

### Adjusting Axis Limits
- **To focus on a meaningful range of values, adjust the x/y limits:**
  ```python
  plt.xlim(0, 250000)  # Limit salary range for better visualization
  ```
- Helps in cases where outliers skew the data.

---

## Matplotlib: Histograms

### Understanding Histograms
- **Histograms show the distribution of numerical values.**
- Example: Distribution of salary values.

### Creating a Histogram
- Example:
  ```python
  df['Salary'].plot(kind='hist', bins=30, edgecolor='black')
  ```
- **Bins**: The number of bars in the histogram. More bins = more granularity.
- **Edge Color**: Adds black borders to make bars more distinct.

### Customizing Histograms
- **Adjust bin size:**
  ```python
  df['Salary'].plot(kind='hist', bins=50, edgecolor='black')
  ```
- **Limit x-axis for clarity:**
  ```python
  plt.xlim(0, 250000)
  ```
- **Formatting salary labels as `$100K`:**
  ```python
  ax = plt.gca()
  ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, pos: f'${int(x/1000)}K'))
  ```

---

## Matplotlib: Box Plots

### Understanding Box Plots
- **Box plots show salary distribution & outliers.**
- **Median (Middle Line)**: The 50th percentile.
- **Interquartile Range (Box)**: 25th to 75th percentile.
- **Whiskers**: Approximate range of typical values.
- **Outliers (Dots)**: Unusually high/low salaries.

### Creating a Box Plot
- Example:
  ```python
  df.boxplot(column='Salary', vert=False)
  ```
- **Horizontal Layout** (`vert=False`) makes comparison easier.

### Comparing Job Roles with Box Plots
- Example:
  ```python
  df.boxplot(column='Salary', by='Job Title', vert=False)
  ```
- Helps compare salary distribution between Data Analysts, Data Engineers, and Data Scientists.

---

## Exercise: Skill Pay Analysis

### Goal:
- **Compare high-paying skills vs. most in-demand skills.**

### Step 1: Exploding Job Skills
- **Split skills into separate rows:**
  ```python
  df_exploded = df.explode('Job Skills')
  ```

### Step 2: Finding Top-Paying Skills
- **Group data by skill and calculate median salary:**
  ```python
  skill_salary = df_exploded.groupby('Job Skills')['Salary'].median().sort_values(ascending=False)
  ```
- **Display top 10:**
  ```python
  skill_salary.head(10)
  ```

### Step 3: Finding Most In-Demand Skills
- **Count how often each skill appears:**
  ```python
  skill_counts = df_exploded['Job Skills'].value_counts()
  ```
- **Display top 10:**
  ```python
  skill_counts.head(10)
  ```

### Step 4: Plotting the Results
- **Side-by-side bar charts for salary & demand:**
  ```python
  fig, ax = plt.subplots(2, 1)
  skill_salary.head(10).plot(kind='barh', ax=ax[0], title='Top-Paying Skills')
  skill_counts.head(10).plot(kind='barh', ax=ax[1], title='Most In-Demand Skills')
  plt.show()
  ```

---

## Seaborn: Introduction & Advanced Data Visualizations

### What is Seaborn?
- **Seaborn is a visualization library built on top of Matplotlib.**
- Provides better aesthetics & built-in themes.

### Installing & Importing Seaborn
- **Installation:**
  ```bash
  pip install seaborn
  ```
- **Importing:**
  ```python
  import seaborn as sns
  ```

### Seaborn vs. Matplotlib
- **Matplotlib**: Basic, requires customization.
- **Seaborn**: Pre-styled with built-in themes & color palettes.

### Example Comparison
```python
import seaborn as sns
sns.boxplot(x=df['Salary'])  # Easier than Matplotlib!
```

### Applying Seaborn Themes
- **Change appearance with `set_theme()`:**
  ```python
  sns.set_theme(style='darkgrid')
  ```
- **Other styles:** `'whitegrid'`, `'ticks'`, `'dark'`, `'white'`.

### Creating a Styled Box Plot
- Example:
  ```python
  sns.boxplot(x='Salary', y='Job Title', data=df)
  ```
- Adds color & improves readability over Matplotlib.

### Additional Seaborn Visualizations
- **Correlation Heatmap:**
  ```python
  sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
  ```
- **Pairplot for Visualizing Relationships:**
  ```python
  sns.pairplot(df, hue='Job Title')
  ```
- **Countplot for Skill Demand Analysis:**
  ```python
  sns.countplot(y=df_exploded['Job Skills'], order=df_exploded['Job Skills'].value_counts().index[:10])
  ```

---

✅ **Complete Course Summary:**

✔ **Python Basics & Data Structures** (Variables, Lists, Dictionaries, Loops, Functions)
✔ **Pandas for Data Manipulation** (Cleaning, Grouping, Merging, Aggregating)
✔ **Matplotlib for Basic & Advanced Visualizations** (Histograms, Box Plots, Custom Formatting)
✔ **Seaborn for Beautiful & Advanced Visualizations** (Box Plots, Heatmaps, Pairplots)
✔ **Data Analysis Project Exercises** (Skill Pay Analysis, Trending Skills, Job Market Insights)

🚀 **Congratulations! You've completed the Data Analyst Crash Course!** 🎉

