## ANSWER

1. NumPy is a fundamental Python library used for numerical computing, providing efficient tools for working with multi-dimensional arrays and matrices. It's used in Python for its speed, efficiency, and ability to perform complex mathematical operations on large datasets, making it an essential library in fields like data science and scientific computing.

2. The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

3. Pandas DataFrame is a way to represent and work with tabular data. It can be seen as a table that organizes data into rows and columns, making it a two-dimensional data structure. A DataFrame can be created from scratch, or you can use other data structures, like NumPy arrays

4. The groupby function in Pandas is a tool that helps you organize data into groups based on certain criteria, like the values in a column. This makes it easier to analyze and summarize your data. Let me take an example to elaborate on this. Let's say we are trying to analyze the weight of a person in a city.

5. Ease of use: One of the main advantages of Seaborn over Matplotlib is its ease of use. Seaborn is built on top of Matplotlib and provides a higher-level interface for creating statistical graphics. This means that Seaborn requires less code to create complex visualizations compared to Matplotlib.

6. The biggest difference is NumPy arrays use fewer resources than Python lists, which becomes important when storing a large amount of data. If you're working with thousands of elements, Python lists will be fine for most purposes.

7. A heatmap is a data visualization technique that uses color to represent the magnitude of values across a two-dimensional space. It's essentially a table or grid where each cell's color corresponds to a specific value, allowing for a quick visual overview of data patterns. Heatmaps are useful for analyzing website user behavior, visualizing complex data sets, and identifying trends or correlations.

8. Vectorized operations, on the other hand, take advantage of low-level optimizations implemented in libraries like NumPy (which pandas is built on). These operations apply a function to an entire array (or DataFrame/Series) at once, leveraging highly efficient C and Fortran code under the hood.

9. Matplotlib: Is often preferred for academic or highly customized plots because you can fine-tune just about any aspect of the figure—fonts, margins, axis scales, etc. Plotly: While still highly customizable, Plotly's real strength lies in interactivity and web-based visuals.

10. Hierarchical Indexing, also known as MultiIndexing, is a powerful feature in Pandas that allows you to have multiple levels of indexing on an axis (row or column). This capability is particularly useful when dealing with high-dimensional data.

11. Seaborn Pairplot: A Guide | Built InSeaborn's pairplot() function creates a grid of plots that visually shows pairwise relationships between all variables in a dataset, offering a quick overview of distributions and correlations. It plots histograms on the diagonal to show individual variable distributions and scatter plots on the off-diagonal to represent relationships between pairs of variables.

12. The purpose of the describe() function in Pandas is to generate descriptive statistics for a DataFrame or Series. It provides a summary of the data, including the count, mean, standard deviation, minimum, 25th percentile (Q1), median (50th percentile or Q2), 75th percentile (Q3), and maximum values for each numerical column. For non-numerical columns, it also provides statistics like unique values, top value, and frequency.

13. Handling missing data in Pandas is crucial for data integrity, accurate analysis, and reliable machine learning model performance. Unaddressed missing values can lead to skewed results, biased models, and inaccurate insights.

14. Pros: High interactivity: gives the end-user a wealth of options to interact with data through zooming, panning, and hovering, letting them explore the data on their own terms. Versatile chart options: from bar charts to heatmaps, there are many options for different data types and use cases.

15. For numpy, all data in any multi-dimensional array is actually stored in memory as a long 1D array (we will get back to this in the master lecture). The number of dimensions and shape of an array is actually only used to structure the data access in a certain way.

16. Start using this Interactive Data Visualization Library ...Bokeh is a Python library primarily used for creating interactive data visualizations that target modern web browsers. It allows users to build a wide range of visualizations, from simple plots to complex dashboards, and focuses on delivering JavaScript-powered visualizations without requiring extensive JavaScript coding. Bokeh excels at handling large datasets efficiently and offers features like zooming, panning, and tooltips for interactive exploration.

17. map' applies a function to each element of a series and returns a new series with the function applied. Often used when a transformation or substitution is needed. 'apply' can be used on both Series(Element-wise) and DataFrames(Rows or Columns), returns a new Dataframe or series.

18. NumPy, a fundamental Python library for scientific computing, offers several advanced features beyond its core functionalities. These include broadcasting, structured arrays, fancy indexing, and advanced array manipulation. Additionally, NumPy provides functionalities for universal functions (ufuncs), linear algebra, and performance optimization.

19. Many data analysts choose Pandas for time series analysis because it provides easy ways of manipulating the data. For example, the DatetimeIndex makes performing most operations in Pandas very simple, as it allows you to index, slice, and resample data based on the date and time.

20. It allows you to restructure a DataFrame by turning rows into columns and columns into rows based on a specified index column, a specified columns column, and a specified values column. This creates a summary table of the data that is easy to read and analyze.

21. They are faster as they are stored in a continuous place in a memory. In this article, we'll explore NumPy array slicing, which involves taking some elements from one given index to another given index.

22. Seaborn is primarily used for statistical data visualization, making it a powerful tool for exploratory data analysis (EDA), understanding data relationships, and communicating insights through visualizations. It's particularly helpful for tasks like identifying trends, visualizing correlations, and exploring data distributions.

In [None]:
## CODE

In [None]:
1.
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [None]:
2.
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Salary': [50000, 60000, 70000, 80000]}
df = pd.DataFrame(data)

salary_mean = df['Salary'].mean()

print(f"The mean salary is: {salary_mean}")

In [None]:
3.
import matplotlib.pyplot as plt

x = [10, 20, 30, 40, 50]
y = [15, 25, 35, 45, 55]

plt.scatter(x, y, color='green', marker='x', s=100)  # 's' sets the size of points

plt.xlabel('X values')
plt.ylabel('Y values')
plt.title('Example Scatter Plot')

plt.grid(True)
plt.show()

In [None]:
4.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

data = {
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1],
    'C': [2, 3, 4, 5, 6],
}

df = pd.DataFrame(data)

corr = df.corr()

plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm', linewidths=0.5)

plt.title('Correlation Matrix Heatmap')

plt.show()


In [None]:
5.
import plotly.express as px

data = {
    'Fruits': ['Apples', 'Bananas', 'Cherries', 'Dates'],
    'Quantity': [10, 15, 7, 5]
}

import pandas as pd
df = pd.DataFrame(data)

fig = px.bar(df, x='Fruits', y='Quantity', title='Fruit Quantity Bar Plot')

fig.show()

In [None]:
6.
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Score': [85, 62, 90, 70]
}

df = pd.DataFrame(data)

df['Passed'] = df['Score'] >= 70

print(df)

In [None]:
7.
import numpy as np

array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])

result = array1 * array2

print("Array 1:", array1)
print("Array 2:", array2)
print("Element-wise multiplication:", result)


In [None]:
8.
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y1 = [1, 4, 9, 16, 25]
y2 = [2, 3, 5, 7, 11]
y3 = [5, 3, 2, 4, 1]

plt.plot(x, y1, label='y = x²', color='blue', linestyle='-')
plt.plot(x, y2, label='Prime Numbers', color='green', linestyle='--')
plt.plot(x, y3, label='Random', color='red', linestyle='-.')

plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.title('Multiple Line Plot Example')

plt.legend()

plt.grid(True)

plt.show()

In [None]:
9.
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 17, 35, 16, 29]
}

df = pd.DataFrame(data)

age_threshold = 18

filtered_df = df[df['Age'] > age_threshold]

print(filtered_df)

In [None]:
10.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

data = np.random.normal(loc=50, scale=10, size=1000)

sns.histplot(data, bins=30, kde=True, color='skyblue')

plt.title('Distribution of Values')
plt.xlabel('Value')
plt.ylabel('Frequency')

plt.show()

In [None]:
11.
import numpy as np

A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

result = np.matmul(A, B)

print("Matrix A:\n", A)
print("Matrix B:\n", B)
print("A x B:\n", result)

In [None]:
12.
import pandas as pd

df = pd.read_csv('your_file.csv')

print(df.head())

In [None]:
13.
import plotly.express as px
import pandas as pd

data = {
    'X': [1, 2, 3, 4, 5],
    'Y': [6, 7, 8, 9, 10],
    'Z': [11, 12, 13, 14, 15],
    'Color': [5, 10, 15, 20, 25]
}

df = pd.DataFrame(data)

fig = px.scatter_3d(df, x='X', y='Y', z='Z', color='Color', title='3D Scatter Plot Example')

fig.show()