# Scientific Libraries and Data Visualization in Python

## 4. Scientific Libraries in Python

Python offers powerful libraries for scientific computing, such as NumPy for numerical operations and Pandas for data manipulation (which is actually build on top of Numpy).

### NumPy for Numerical Operations
NumPy is the fundamental package for numerical computation in Python. It provides support for arrays and matrices, along with a collection of mathematical functions to operate on them.

#### Example: Basic NumPy Operations

In [None]:
import numpy as np

# Creating a NumPy array
data = np.array([1, 2, 3, 4, 5])

# Array operations
mean = np.mean(data)
sum_data = np.sum(data)
squared_data = np.square(data)

print(f"Mean: {mean}, Sum: {sum_data}, Squared: {squared_data}")

### Pandas for Data Manipulation
Pandas is a powerful library for data manipulation and analysis, providing data structures like Series (1D) and DataFrame (2D) that make it easy to work with structured data.

#### Example: Basic Pandas Operations

In [None]:
import pandas as pd

# Creating a Pandas DataFrame
data = {
    "sample_id": [101, 102, 103, 104],
    "temperature": [23.5, 24.1, 22.8, 23.9],
    "measurement": ["viscosity", "density", "viscosity", "density"],
    "value": [12.3, 0.89, 13.1, 0.92]
}

df = pd.DataFrame(data)

# Accessing columns and rows
temperatures = df["temperature"]
first_row = df.iloc[0]

# Basic operations
mean_temperature = df["temperature"].mean()

# Display the dataframe
df

## How To Know These Functions Exist

Jupyter Notebook provides built-in tools to help you explore the functionalities of Python libraries interactively. Two of the most useful features for this purpose are the question mark (`?`) for accessing documentation, and tab completion for discovering available methods and attributes.

### Using the Question Mark (?)

The question mark (`?`) can be used to access documentation for functions, methods, and modules directly within a Jupyter Notebook cell. This feature provides a quick way to get an overview of the functionality, parameters, and other useful information.

#### Example: Accessing Documentation
To view the documentation for Pandas' DataFrame, you can enter the following in a cell:

In [None]:
import pandas as pd
pd.DataFrame?

### Using Tab Completion

Tab completion is particularly useful for exploring the attributes and methods available in a library, such as different functions that can be applied to a data structure.

In [None]:
# Go to the end of this and hit tab to see what you can import data from:
pd.read_

## 5. Data Visualization

Data visualization is a crucial aspect of data analysis. It helps in understanding the data better and communicating results effectively. Matplotlib is a popular library for creating visualizations in Python.

### Matplotlib for Basic Plots
Matplotlib is a versatile library for creating static, animated, and interactive visualizations.

#### Example: Basic Plotting with Matplotlib

In [None]:
import matplotlib.pyplot as plt

# Line plot
plt.figure(figsize=(8, 4))
plt.scatter(df["temperature"], df["value"])
plt.xlabel("Temperature")
plt.ylabel("Value")
plt.grid(True)
plt.show()

#### Example: Lost of Plot Types!

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Data for plotting
x = np.linspace(0, 10, 100)
y = np.sin(x)
categories = ['Apple', 'Banana', 'Cherry', 'Date', 'Elderberry']
values = [5, 3, 4, 2, 6]
sizes = [215, 130, 245, 210, 300]
data_for_box = np.random.normal(0, 1, size=100)

# Create a figure to hold all subplots
fig, axs = plt.subplots(3, 2, figsize=(12, 12))  # 3x2 subplot grid

# Line plot
axs[0, 0].plot(x, y)
axs[0, 0].set_title('Line Plot')
axs[0, 0].set_xlabel('x')
axs[0, 0].set_ylabel('sin(x)')

# Scatter plot
axs[0, 1].scatter(x, np.abs(y), c=y, cmap='viridis')
axs[0, 1].set_title('Scatter Plot')
axs[0, 1].set_xlabel('x')
axs[0, 1].set_ylabel('abs(sin(x))')

# Histogram
axs[1, 0].hist(y, bins=10)
axs[1, 0].set_title('Histogram')
axs[1, 0].set_xlabel('sin(x)')
axs[1, 0].set_ylabel('Frequency')

# Bar chart
axs[1, 1].bar(categories, values, color='indigo')
axs[1, 1].set_title('Bar Chart')
axs[1, 1].set_xlabel('Fruit')
axs[1, 1].set_ylabel('Values')

# Pie chart
axs[2, 0].pie(sizes, labels=categories, autopct='%1.1f%%', startangle=90)
axs[2, 0].set_aspect('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
axs[2, 0].set_title('Pie Chart')

# Box plot
axs[2, 1].boxplot(data_for_box)
axs[2, 1].set_title('Box Plot')
axs[2, 1].set_xlabel('Data Set')
axs[2, 1].set_ylabel('Values')

# Adjust layout to prevent overlap
plt.tight_layout()

# Show plot
plt.show()


Next we'll move on to some TA specific stuff in [this notebook](3_TADataKit.ipynb)