## 🧠 Theoretical Questions

### 1. What is NumPy, and why is it widely used in Python ?

**NumPy** (Numerical Python) is a powerful Python library used for **numerical computing**. It provides efficient support for large, multi-dimensional arrays and matrices, along with a wide range of mathematical functions to operate on them. 
<br><br>
**<u>It is widely used</u>** because it offers **high performance, easy array manipulation, vectorized operations, and integration with scientific and machine learning libraries**.


####
### 2. How does broadcasting work in NumPy ?

**Broadcasting in NumPy** is a mechanism that allows arrays of different shapes to be used together in arithmetic operations by automatically expanding the smaller array across the larger one without making extra copies. This makes computations more **efficient, memory-friendly, and concise**.


####
### 3. What is a Pandas DataFrame ?

A **Pandas DataFrame** is a **two-dimensional, labeled data structure** in Python designed to represent **tabular data**. It organizes data into **rows (index)** and **columns (labels)**, where each column can hold different data types (heterogeneous). Built on top of **NumPy arrays**, it provides a flexible and intuitive way to handle structured data, making it similar to a **spreadsheet or SQL table**.


####
### 4. Explain the use of the groupby() method in Pandas.

The **`groupby()`** method in Pandas is used to **split data into groups** based on one or more keys (columns), allowing for **aggregation, transformation, and analysis** on each group independently. It follows the **Split–Apply–Combine** strategy:

* **Split**: Divides data into groups.
* **Apply**: Performs functions like sum, mean, count, etc., on each group.
* **Combine**: Merges the results back into a DataFrame or Series.


####
### 5. Why is Seaborn preferred for statistical visualizations ?

**Seaborn** is preferred for **statistical visualizations** because it is built on top of Matplotlib and provides a **high-level, easy-to-use interface** for creating informative and attractive plots. It comes with **built-in themes, color palettes, and functions** that simplify visualizing complex statistical relationships.

**<u>Key Reasons</u>**:

* Offers **concise syntax** for complex plots.
* Provides **statistical plots** like distribution, regression, and categorical plots.
* Automatically handles **aesthetics and styles** for better readability.
* Integrates seamlessly with **Pandas DataFrames**.
* Reduces code complexity while producing **publication-quality visuals**.


####
### 6. What are the differences between NumPy arrays and Python lists ?



| Aspect                | NumPy Arrays                                                   | Python Lists                                     |
| --------------------- | -------------------------------------------------------------- | ------------------------------------------------ |
| **Storage Type**      | Homogeneous – all elements must be of the same data type       | Heterogeneous – can store mixed data types       |
| **Memory Efficiency** | More compact and efficient                                     | Less memory efficient                            |
| **Performance**       | Supports fast, vectorized operations                           | Slower, requires explicit loops for operations   |
| **Functionality**     | Provides a wide range of mathematical and scientific functions | Limited functionality, mainly general-purpose    |
| **Dimensionality**    | Can be multi-dimensional (1D, 2D, 3D, etc.)                    | Primarily 1D; higher dimensions via nested lists |
| **Best Suited For**   | Numerical and scientific computing                             | General-purpose data storage                     |


####
### 7. What is a heatmap, and when should it be used ?

A **heatmap** is a **data visualization technique** that represents values in a matrix or 2D data using **color gradients**, where varying intensities of color indicate differences in magnitude.

**<u>When to Use</u>**:

* To show **correlation between variables** (e.g., correlation matrix).
* To identify **patterns, trends, or anomalies** in large datasets.
* To visualize **frequency, density, or intensity** of data values.
* Useful in **exploratory data analysis (EDA)** for quick insights.


####
### 8. What does the term “vectorized operation” mean in NumPy ?

A **vectorized operation** in NumPy means applying a computation directly on entire arrays (vectors, matrices, etc.) without writing explicit loops, allowing faster execution through optimized low-level implementations.


####
### 9.  How does Matplotlib differ from Plotly ?

| Aspect              | **Matplotlib**                                            | **Plotly**                                                        |
| ------------------- | --------------------------------------------------------- | ----------------------------------------------------------------- |
| **Type**            | Low-level, static visualization library                   | High-level, interactive visualization library                     |
| **Interactivity**   | Primarily static plots (basic interactivity via toolkits) | Built-in interactivity (hover, zoom, pan)                         |
| **Ease of Use**     | Requires more code for complex visuals                    | Concise syntax for complex, interactive plots                     |
| **Customization**   | Highly customizable but verbose                           | Good customization with simpler syntax                            |
| **Output**          | Generates static images (PNG, PDF, etc.)                  | Generates interactive web-based visualizations (HTML, dashboards) |
| **Best Suited For** | Traditional scientific plotting and publications          | Interactive dashboards, data exploration, and web apps            |


####
### 10. What is the significance of hierarchical indexing in Pandas ?

* **Hierarchical indexing (MultiIndex)** in Pandas allows multiple levels of indexing on rows and/or columns.
* Represents higher-dimensional data within a **2D DataFrame**.
* Enables **complex data selection, slicing, and subsetting** across multiple keys.
* Useful for **grouped, panel, or pivoted data** structures.
* Makes data analysis more **flexible and structured** without expanding dimensions unnecessarily.


####
### 11. What is the role of Seaborn’s pairplot() function ?

The **`pairplot()`** function in Seaborn visualizes **pairwise relationships** between variables in a dataset using scatterplots for combinations and histograms/KDE plots on the diagonals. It is commonly used in **exploratory data analysis (EDA)** to quickly detect correlations, trends, and distributions across multiple variables.


####
### 12. What is the purpose of the describe() function in Pandas ?

The **`describe()`** function in Pandas generates **summary statistics** of a DataFrame or Series, including count, mean, standard deviation, min, max, and quartiles. It is mainly used for **quick exploratory analysis** to understand the central tendency, dispersion, and distribution of data.


####
### 13. Why is handling missing data important in Pandas ?

Handling missing data is important in Pandas because incomplete values can lead to **inaccurate analysis, biased results, or errors in computations**. Proper handling ensures **data quality, consistency, and reliability** for statistical analysis and machine learning models.


####
### 14. What are the benefits of using Plotly for data visualization ?

Plotly provides **interactive, web-based visualizations** that make data exploration more engaging and insightful. It supports a wide range of chart types, offers **easy integration with Python and Pandas**, and is ideal for building **dashboards, reports, and data apps** with minimal code.


####
### 15. How does NumPy handle multidimensional arrays ?

NumPy handles multidimensional arrays using the **`ndarray`** object, which can represent data in any number of dimensions (1D, 2D, 3D, etc.). It provides efficient storage, indexing, slicing, and mathematical operations across dimensions, making it powerful for scientific and numerical computing.


####
### 16. What is the role of Bokeh in data visualization ?

**Bokeh** is a Python library for creating **interactive, web-ready visualizations**. It allows building dashboards and plots that can handle **large datasets efficiently**, with features like zooming, panning, and real-time updates, making it suitable for both **exploration and presentation**.


####
### 17.  Explain the difference between apply() and map() in Pandas.

* **`map()`**: Applies a function element-wise to a **Series**, transforming each value individually.
* **`apply()`**: Works on both **Series and DataFrames**, allowing functions to be applied element-wise, row-wise, or column-wise for more complex operations.


####
### 18. What are some advanced features of NumPy ?

Some advanced features of **NumPy** include:

* **Broadcasting** for operations on arrays of different shapes.
* **Vectorization** for fast, loop-free computations.
* **Linear algebra functions** (matrix multiplication, eigenvalues, etc.).
* **Fourier transforms** and signal processing tools.
* **Random number generation** and probability distributions.
* **Masked arrays** for handling missing or invalid data.


####
### 19. How does Pandas simplify time series analysis ?

Pandas simplifies time series analysis through features like **datetime indexing**, **resampling**, and **frequency conversion**. It provides built-in functions for **date parsing, shifting, rolling statistics, and window operations**, making it easy to analyze trends, seasonality, and time-based patterns in data.


####
### 20. What is the role of a pivot table in Pandas ?

A **pivot table** in Pandas is used to **summarize and reorganize data** by grouping values across one or more keys and applying aggregation functions like sum, mean, or count. It helps in **quickly analyzing patterns and relationships** within large datasets.


####
### 21. Why is NumPy’s array slicing faster than Python’s list slicing ?

NumPy’s array slicing is faster than Python’s list slicing because NumPy arrays use **contiguous memory blocks** and **views** (no data copying), while Python lists store references to objects in scattered memory. This allows NumPy to perform slicing operations at a **low-level C speed**, making them more efficient.


####
### 22. What are some common use cases for Seaborn ?

Common use cases for **Seaborn** include:

* **Visualizing distributions** (histograms, KDE plots).
* **Exploring relationships** between variables (scatterplots, pairplots).
* **Comparing categories** with bar plots, box plots, and violin plots.
* **Heatmaps** for correlation or matrix data.
* Creating **aesthetically styled statistical plots** with minimal code.
