## Plotting using Matplotlib

### Matplotlib Overview

Matplotlib is Python's primary plotting library for creating static, animated, and interactive visualizations. It provides a MATLAB-like interface through its pyplot module, making it familiar to users coming from that environment.

The library offers two main interfaces: a stateful pyplot interface for quick plots and an object-oriented interface for more complex, customizable figures. We can create a wide range of visualizations including line plots, scatter plots, bar charts, histograms, heatmaps, and 3D plots.

Key components include figures (the overall container), axes (the plotting area), and artists (everything we see on the plot). Matplotlib integrates well with NumPy arrays and pandas DataFrames.

Common use cases include exploratory data analysis, scientific visualization, and creating publication-ready figures. While it's powerful and highly customizable, some find its syntax verbose compared to newer libraries like Seaborn or Plotly, which often build on matplotlib's foundation.

The library supports multiple output formats (PNG, PDF, SVG, etc.) and backends for different display environments, from Jupyter notebooks to web applications.

### Types of Data
- Numerical Data
- Categorical Data

Graph plotting techniques - 2D Plots, Scatter plot, BarChart, Histogram, PieChart

In order to plot graph for any data we need to know type of data prior.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('default')

### 2D Line Plots

In [None]:
# plotting a simple function
price = [48000, 54000, 57000, 49000, 47000, 45000]
year = [2015, 2016, 2017, 2018, 2019, 2020]

# line plot
plt.plot(year, price)

In [None]:
# import csv file
batsmen_dataset = pd.read_csv('./sharma-kohli.csv')

# plotting multiple plots
plt.plot(batsmen_dataset['index'], batsmen_dataset['V Kohli'], color="#f1470f", linestyle='solid', linewidth='3', marker='D', markersize=10)
plt.plot(batsmen_dataset['index'], batsmen_dataset['RG Sharma'], color="#0054fc", linestyle='dashdot', linewidth='2', marker='D', markersize=6)

# labels title
plt.title('Rohit Sharma Vs Virat Kohli in IPL')
plt.xlabel('Season')
plt.ylabel('Runs Scored')

# colors(hex) and line(width and style) and marker(size)
# plt.plot(batsmen_dataset['index'], batsmen_dataset['V Kohli'], color="#f1470f")

# adding line style
# plt.plot(batsmen_dataset['index'], batsmen_dataset['V Kohli'], color="#f1470f", linestyle='dashed')

# using marker
# plt.plot(batsmen_dataset['index'], batsmen_dataset[

In [None]:
# legend -- location
plt.plot(batsmen_dataset['index'], batsmen_dataset['V Kohli'], color="#f1470f", linestyle='solid', linewidth='3', marker='D', markersize=10, label='Virat')
plt.plot(batsmen_dataset['index'], batsmen_dataset['RG Sharma'], color="#0054fc", linestyle='dashdot', linewidth='2', marker='D', markersize=6, label='Rohit')

# labels title
plt.title('Rohit Sharma Vs Virat Kohli in IPL')
plt.xlabel('Season')
plt.ylabel('Runs Scored')

# plt.legend(loc='upper right')

# legend - to print label on graph
plt.legend()

In [None]:
# limiting axes
price = [48000, 54000, 57000, 49000, 47000, 45000, 4500000]
year = [2015, 2016, 2017, 2018, 2019, 2020, 2021]

# line plot
plt.plot(year, price)
plt.ylim(0,75000)
plt.xlim(2017, 2019)

In [None]:
# grid
plt.plot(batsmen_dataset['index'], batsmen_dataset['V Kohli'], color="#f1470f", linestyle='solid', linewidth='3', marker='D', markersize=10, label='Virat')
plt.plot(batsmen_dataset['index'], batsmen_dataset['RG Sharma'], color="#0054fc", linestyle='dashdot', linewidth='2', marker='D', markersize=6, label='Rohit')

# labels title
plt.title('Rohit Sharma Vs Virat Kohli in IPL')
plt.xlabel('Season')
plt.ylabel('Runs Scored')

plt.grid()

In [None]:
# show
plt.plot(batsmen_dataset['index'], batsmen_dataset['V Kohli'], color="#f1470f", linestyle='solid', linewidth='3', marker='D', markersize=10, label='Virat')
plt.plot(batsmen_dataset['index'], batsmen_dataset['RG Sharma'], color="#0054fc", linestyle='dashdot', linewidth='2', marker='D', markersize=6, label='Rohit')

# labels title
plt.title('Rohit Sharma Vs Virat Kohli in IPL')
plt.xlabel('Season')
plt.ylabel('Runs Scored')

plt.grid()
plt.show()

### Scatter Plots

- It is used in Bivariate Analysis
- used in Numerical vs Numerical column
- Use case - Find correlaton between two quantities

Note - 2D plot is special case of Scatter Plot.

In [None]:
# plt.sctter simple function

# linspace() is used to create an array of evenly spaced numbers over a specified range
x = np.linspace(-10, 10, 50)
y = 10*x + 3 + np.random.randint(0, 300, 50)

plt.scatter(x, y)

In [None]:
# plt.scatter on pandas data
df = pd.read_csv('./batter.csv')

# use first 50 data
df = df.head(50)

plt.scatter(df['avg'], df['strike_rate'], color='red', marker='*')
plt.title('Avg and SR analysis of top 50 Batsmen')
plt.xlabel('Average')
plt.ylabel('Strike Rate')

In [None]:
# size
tips = sns.load_dataset('tips')

# this is slower 
plt.scatter(tips['total_bill'], tips['tip'], s=tips['size']*20)

In [None]:
# scatterplot using plt.plot()

# 2D plot using scatter plot
# plt.plot(tips['total_bill'], tips['tip'])

# this is faster
plt.plot(tips['total_bill'], tips['tip'], 'o')

### Bar Chart
- used in Bivariate and Univariate Analysis
- Numerical vs Categorical
- Use case - Aggregate analysis of groups

In [None]:
# simple bar chart



### 🧾 **Summary: Matplotlib and Analysis Concepts**

#### 🔹 Types of Analysis Based on Number of Variables:

1. **Univariate Analysis**:

   * Analysis or plotting of a **single column or feature**.
   * Example: Plotting the distribution of "Age".

2. **Bivariate Analysis**:

   * Analysis involving **two variables/columns** together.
   * Example: Plotting "Age" vs. "Salary".

3. **Multivariate Analysis**:

   * Analysis involving **three or more variables**.
   * Example: Using scatter plots with color and size to show more than 2 variables.

---

#### 🔹 Libraries to Be Imported (for the class):

1. **NumPy** – for numerical operations.
2. **Pandas** – for structured/tabular data handling.
3. **Matplotlib** – main focus for graph plotting today.
4. **Seaborn** – imported only to access a dataset (not for plotting today).

---

### ✅ **Important Pointers**:

* Understand what kind of analysis you're doing:

  * **Univariate**, **Bivariate**, or **Multivariate**.
* Before starting:

  * Import the required libraries.
  * Download the necessary datasets (usually provided via a Google Drive link).
* Run all import cells before coding to ensure libraries are available.
* Seaborn has built-in datasets, which can be accessed even if you're not using its plotting features.
* All visualizations in this session will be made using **Matplotlib**, not Seaborn.

---

### ✅ **Concept Summary – 2-D Line Plot (Matplotlib)**

A **2-D line plot** is one of the most commonly used plots in data analysis and visualization. It is especially useful for **bivariate analysis** — analyzing the relationship between **two variables**.

---

### 📌 **Key Points to Remember**

1. **Definition**:

   * A 2D line plot shows data as points connected by straight line segments in two dimensions (X and Y axes).

2. **Use Case**:

   * Primarily used for **bivariate analysis** (i.e., comparing two variables).
   * Cannot be used for a single column alone.
   * Common in **time-series analysis**.

3. **When to Use**:

   * When you want to visualize the **trend** or **pattern** between two columns.
   * Ideal when:

     * One column is **categorical** (like months).
     * Another column is **numerical** (like revenue).
   * Can also be used when **both columns are numerical**.

4. **Example**:

   * X-axis: Months (categorical)
   * Y-axis: Revenue per month (numerical)
   * Connecting data points forms the line plot.
   * Helps visualize how revenue changes over time.

5. **Most Common Scenario**:

   * **Time-series data** is the most frequent use case.
   * Examples:

     * Company stock prices
     * Monthly or yearly revenue
     * COVID cases per day

---

### 🧠 **Important Technical Points**

* **Matplotlib** is the Python library used to create 2D plots.
* The `plot()` function in `matplotlib.pyplot` is used to create line plots.
* X-axis: Independent variable (usually time or category)
* Y-axis: Dependent variable (numerical value changing with X)

---

### ✅ **Conclusion**

* **2D Line Plot** is best for visualizing how a value **changes over time** or **across categories**.
* You **must have two variables**, and it is most helpful when analyzing **trends or patterns** in **time-series** or **bivariate** data.

---

### ✅ **Summary – Matplotlib Concepts (Part 2)**

This section focuses on **practical implementation** of 2D line plots using `matplotlib.pyplot`. It covers:

* Plotting simple lists
* Plotting from a real dataset (IPL data)
* Plotting multiple lines on the same graph
* Adding labels and title for better understanding

---

### 📌 **Important Concepts & Steps**

#### 1. **Simple 2D Plot Example**

* Assume two lists:

  * `years = [2012, 2015, 2018, 2021]`
  * `prices = [45, 54, 49, 57]`
* `years` is the X-axis (categorical or time-based)
* `prices` is the Y-axis (numerical)

```python
import matplotlib.pyplot as plt

years = [2012, 2015, 2018, 2021]
prices = [45, 54, 49, 57]

plt.plot(years, prices)
plt.show()
```

> **Use case:** Simple illustration of a time-based trend (e.g., phone prices).

---

#### 2. **Using a Dataset (Pandas DataFrame)**

* Dataset contains IPL data of two batsmen (Virat Kohli and Rohit Sharma).
* Columns:

  * `Season` – e.g., 2008, 2009, 2010...
  * `Virat Kohli` – Runs per season
  * `Rohit Sharma` – Runs per season

```python
import pandas as pd
import matplotlib.pyplot as plt

# Example DataFrame (simplified)
data = {
    'Season': [2008, 2009, 2010, 2011],
    'Virat Kohli': [165, 246, 307, 557],
    'Rohit Sharma': [404, 362, 362, 372]
}
df = pd.DataFrame(data)

plt.plot(df['Season'], df['Virat Kohli'])  # Plotting single player
plt.show()
```

> ✅ **Best Use Case:** Real-world time-series data.

---

#### 3. **Plotting Multiple Lines (Comparison)**

* Plotting both Virat and Rohit on the same graph for comparison:

```python
plt.plot(df['Season'], df['Virat Kohli'], label='Virat Kohli')
plt.plot(df['Season'], df['Rohit Sharma'], label='Rohit Sharma')

plt.title('Virat Kohli vs Rohit Sharma - IPL Runs Over Seasons')
plt.xlabel('Season')
plt.ylabel('Runs Scored')
plt.legend()
plt.show()
```

> 🎯 Makes it easier to visually compare two trends over time.

---

#### 4. **Adding Labels and Title**

* Use the following to make the graph understandable:

  * `plt.title()` → Graph title
  * `plt.xlabel()` → X-axis label
  * `plt.ylabel()` → Y-axis label
  * `plt.legend()` → Add labels for different lines
  * `plt.style.use()` → Change graph appearance

```python
plt.title("Virat Kohli vs Rohit Sharma")
plt.xlabel("Season")
plt.ylabel("Runs Scored")
plt.legend()
```

---

#### 5. **Graph Styling**

* Different styles like `'classic'`, `'seaborn'`, `'ggplot'`, etc.
* Default style can be restored using:

```python
plt.style.use('default')  # Resets to default matplotlib style
```

> ⚙️ Styling is useful for presentations and better visuals.

---

### 📘 **Pro Tips**

* Always **label your graph** — helpful when presenting to others.
* Use `.legend()` to distinguish multiple plots clearly.
* Handle x-axis formatting if numbers appear in scientific notation (e.g., `2e3` instead of 2000).
* Always inspect the data before plotting.

---

### ✅ Conclusion

This section teaches how to:

* Create simple line plots from lists
* Use real datasets with Pandas and Matplotlib
* Plot multiple series on one graph
* Make your plots **informative** using **titles, labels, and legends**
---

### ✅ **Summary – Matplotlib Customizations (Line Style, Width, Markers, Colors)**

This section focuses on **enhancing the appearance** and **readability** of line plots by customizing:

* Line style (`linestyle`)
* Line width (`linewidth`)
* Markers (`marker`)
* Marker size (`markersize`)

---

### 📌 **Important Concepts & Parameters**

#### 1. **Change Line Style (`linestyle`)**

You can customize how the line appears (solid, dashed, dotted, etc.) using the `linestyle` parameter.

```python
plt.plot(x, y, linestyle='--')  # Dashed line
```

🧩 **Available styles**:

| Symbol | Style Name      |
| ------ | --------------- |
| `'-'`  | Solid (default) |
| `'--'` | Dashed          |
| `':'`  | Dotted          |
| `'-.'` | Dash-dot        |

---

#### 2. **Change Line Width (`linewidth`)**

To make the line thicker or thinner:

```python
plt.plot(x, y, linewidth=2)  # Thicker line
```

* Default is usually `1`
* Higher value → thicker line

---

#### 3. **Add Markers at Data Points (`marker`)**

Use `marker` to highlight data points on the line:

```python
plt.plot(x, y, marker='o')  # Circle marker
```

🧩 **Common marker options**:

| Symbol | Shape       |
| ------ | ----------- |
| `'o'`  | Circle      |
| `'s'`  | Square      |
| `'^'`  | Triangle up |
| `'d'`  | Diamond     |
| `'+'`  | Plus        |
| `'x'`  | Cross       |
| `'>'`  | Right arrow |
| `'<'`  | Left arrow  |

---

#### 4. **Customize Marker Size (`markersize`)**

To make markers bigger/smaller:

```python
plt.plot(x, y, marker='o', markersize=10)
```

> 🎯 Useful when you want data points to stand out visually.

---

#### 5. **Use Multiple Styles Together**

You can **combine all these parameters** to make a customized plot:

```python
plt.plot(x, y, 
         linestyle='--', 
         linewidth=2, 
         marker='d', 
         markersize=8)
```

> 🖌️ This gives full control over the visual representation of your line.

---

### ✅ **What You Learned**

| Feature            | Parameter    | Use                               |
| ------------------ | ------------ | --------------------------------- |
| Line Style         | `linestyle`  | Make lines dashed, dotted, etc.   |
| Line Thickness     | `linewidth`  | Control the thickness of the line |
| Data Point Markers | `marker`     | Show points using symbols         |
| Marker Size        | `markersize` | Make markers larger or smaller    |

---

### 🧠 **Best Practices**

* Use **different styles** when plotting multiple lines to visually differentiate them.
* Use **markers** in time series to emphasize actual data points.
* Always **adjust styling** based on who will view the graph (e.g., managers, clients).

---

legend-location (probable values)
- best
- upper right
- lower left
- lower right
- right
- center left
- center right
- lower center

---

### ✅ **Summary – Labelling, Axis Limits, Gridlines, and Plot Display**

---

#### 🎯 1. **Labelling Lines with `label` and `legend()`**

When plotting multiple lines, it can be **unclear which line represents what**. Use the `label` parameter to name each line, and use `plt.legend()` to display the legend on the plot.

```python
plt.plot(x, y1, label='Virat Kohli')
plt.plot(x, y2, label='Rohit Sharma')
plt.legend()
```

🧠 The legend will **automatically be placed** in the best spot using `loc='best'`. But you can override this using:

* `'upper right'`
* `'lower left'`
* `'center'`, etc.

```python
plt.legend(loc='upper right')
```

📌 **Key Points**:

* Use `label` inside `plot()`.
* Use `plt.legend()` to show the legend.
* Control placement with `loc`.

---

#### 🎯 2. **Limiting Axes with `plt.ylim()` and `plt.xlim()`**

If there’s an **outlier value**, the graph can get skewed and compress important data. You can **trim the viewable axis range**:

```python
plt.ylim(0, 100000)        # Limit y-axis
plt.xlim(2017, 2019)       # Limit x-axis
```

🔍 **Why use it?**

* Helps ignore outliers
* Highlights specific ranges
* Improves visualization clarity

📌 **Key Points**:

* Use when one or two points distort the overall plot.
* Makes graph readable by focusing on meaningful data ranges.

---

#### 🎯 3. **Using Gridlines with `plt.grid()`**

Gridlines improve **readability** of plots, especially when:

* You want to know exact point locations
* Working with larger datasets

```python
plt.grid()
```

📌 **Key Points**:

* Adds horizontal and vertical reference lines.
* Useful for presentations, comparisons, analysis.
* Optional, but **recommended for clarity**.

---

#### 🎯 4. **Displaying the Plot with `plt.show()`**

In non-interactive environments (like text editors or Flask apps), plots **won’t display unless you explicitly call**:

```python
plt.show()
```

📌 **Key Points**:

* Must be called at the end to render plots.
* Not needed in Jupyter notebooks, but **required in scripts** and apps.

---

### ✅ **Quick Recap – Concepts & Parameters Table**

| Feature              | Function / Parameter                | Purpose                                 |
| -------------------- | ----------------------------------- | --------------------------------------- |
| **Legend**           | `label`, `plt.legend()`             | Identify which line belongs to whom     |
| **Legend Location**  | `loc='best'`, `'upper right'`, etc. | Manual/automatic legend placement       |
| **Limit Y-axis**     | `plt.ylim(min, max)`                | Focus on specific Y range               |
| **Limit X-axis**     | `plt.xlim(min, max)`                | Focus on specific X range               |
| **Enable Gridlines** | `plt.grid()`                        | Show background grid for reference      |
| **Show Plot**        | `plt.show()`                        | Required to display the plot in scripts |

---

### ✅ **Real-World Scenario Examples**

1. **Outlier Adjustment**:

   * Viral video with 1 crore views affects rest of the daily view trends? Use `ylim()` to trim it.

2. **Focus on Range**:

   * Want to see sales only from 2017 to 2019? Use `xlim(2017, 2019)`.

3. **Clarity**:

   * Use gridlines to align visually which point lies where.
---

Here’s a **complete summary** of the concepts discussed about **Scatter Plot** and **2D Line Plot** using **Matplotlib in Python**, along with all **key points and takeaways**:

---

### ✅ **Concepts Covered**

1. **2D Line Plot (`plt.plot()`)**
2. **Scatter Plot (`plt.scatter()`)**
3. **Using marker size & labels**
4. **When to use `plt.plot()` vs `plt.scatter()`**
5. **Multi-dimensional scatter plots using `size (s)`**
6. **Comparison and performance discussion**

---

### 🔷 **1. 2D Line Plot – `plt.plot()`**

#### 📌 **Key Points:**

* Used to study the relationship between **two variables** (X vs Y).
* Requires **two columns**: one for the X-axis, one for the Y-axis.
* Common use: **Time-series data** (e.g., revenue over months).
* A **line is drawn** connecting the data points in order.
* **X-axis**: Often a categorical/time-based variable.
* **Y-axis**: Numerical variable.

#### 🧠 Example:

```python
import matplotlib.pyplot as plt
plt.plot(months, revenue)
```

---

### 🔷 **2. Scatter Plot – `plt.scatter()`**

#### 📌 **Key Points:**

* Used for **bivariate analysis** with two numerical columns.
* Helpful for understanding the **correlation** between two quantities.
* Common use: Compare metrics like **Average vs Strike Rate**, **Bill vs Tip**, etc.
* Each data point is plotted as a **dot**, without lines connecting them.
* Helps in **pattern recognition** and **outlier detection**.

#### 🧠 Example:

```python
plt.scatter(x_values, y_values)
```

---

### 🧪 **Visual Example Explained** (Scatter Plot):

* IPL Batsmen Dataset:

  * X-axis: Batting Average
  * Y-axis: Strike Rate
  * Each dot = one batsman
* Scatter plot shows:

  * Top-right: High average, high strike rate → **best batsman**
  * Bottom-left: Low on both → **not reliable**
  * Other combinations help team owners make decisions.

---

### 🔷 **3. Adding Labels, Title, Marker Customizations**

#### 📌 Add Title, Axis Labels:

```python
plt.title("Average vs Strike Rate Analysis")
plt.xlabel("Average")
plt.ylabel("Strike Rate")
```

#### 📌 Change marker type or size:

```python
plt.scatter(x, y, marker='+', s=100)
```

---

### 🔷 **4. Multi-Dimensional Scatter Plot Using `s` Parameter**

#### 📌 You can visualize 3 dimensions:

* X-axis: Total Bill
* Y-axis: Tip
* Size of the dot (`s`): Number of people with the customer

#### 🧠 Example:

```python
plt.scatter(tips['total_bill'], tips['tip'], s=tips['size'] * 20)
```

---

### 🔷 **5. `plt.plot()` vs `plt.scatter()`**

| Feature                   | `plt.plot()`                 | `plt.scatter()`                      |
| ------------------------- | ---------------------------- | ------------------------------------ |
| Best for                  | **Line charts**, time series | **Correlations**, numeric vs numeric |
| Performance               | **Faster**                   | Slower on large data                 |
| Visual Elements           | Connected lines              | Dots (unconnected)                   |
| Supports size (`s`) param | ❌ No                         | ✅ Yes                                |
| Supports color by value   | ❌ Limited                    | ✅ Better customization               |

---

### 🔷 **6. Behind the Scenes Insight**

* A **2D Line Plot** is technically a **special case of scatter plot** where:

  * You plot dots (like a scatter plot),
  * And then **connect those dots with a line**.

---

### ✅ **Conclusion**

* Use **`plt.scatter()`** for:

  * Numeric vs numeric comparisons.
  * Analyzing relationships, clusters, and trends.
  * Multi-variable encoding (via marker size, color).

* Use **`plt.plot()`** for:

  * Time-series trends.
  * Continuous data where connection between points matters.

---
(01:03:45)