# Tables and plots

Python is a powerful tool for exploring data stored in **DataFrames** using the [Pandas](https://pandas.pydata.org/) library. In this notebook, I show a few basic examples of how to navigate tables with Pandas, plot data using [Matplotlib](https://matplotlib.org/) and [Seaborn](https://seaborn.pydata.org/), and perform statistical tests using [SciPy](https://docs.scipy.org/doc/scipy/reference/stats.html). For learning more, I highly recommend diving into any of the linked resources.


**Genral resources**
* [Common statistical tests are linear models](https://lindeloev.github.io/tests-as-linear/) by Jonas Kristoffer Lindeløv.
* [Learning statistics with Python](https://ethanweed.github.io/pythonbook/landingpage.html) by Danielle Navarro and Ethan Wood.
* [Neural Data Science in Python](https://neuraldatascience.io/intro.html) by Aaron J Newman.
* [Pandas Cookbook](https://github.com/jvns/pandas-cookbook) by Jake VanderPlas.
* [Pandas Notebooks](https://github.com/plembo/pandas-tutorials) by Corey Schafer's.
* [The Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) by Jake VanderPlas. See [Visualization with Matplotlib](https://jakevdp.github.io/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html), [Visualization with Seaborn](https://jakevdp.github.io/PythonDataScienceHandbook/04.14-visualization-with-seaborn.html), and [Data Manipulation with Pandas](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html).


# Example data

In this notebook, I will use the brain cell database from the **Allen Institute for Brain Science** as an example dataset.  You can also download the table directly from the [Allen Brain Atlas website](https://celltypes.brain-map.org/data) by clicking **"Download Cell Feature Data."** This dataset contains electrophysiological and morphological data obtained from patch-clamp recordings in both mouse and human brains. These features were used to classify neurons into distinct subtypes (**Figure 1**), which is one of the main challenges in neuroscience because of the [brain’s high cellular diversity](https://spikesandbursts.wordpress.com/2021/02/28/mapping-neuronal-diversity/).  

Defining a cell type is not trivial, and morphoelectrical features can be combined with transcriptomic data for a more integrative classification ([Gouwens et al., 2020](https://pubmed.ncbi.nlm.nih.gov/33186530/), [Scala et al., 2021](https://www.nature.com/articles/s41586-020-2907-3)). The goal is to identify cell types with similar attributes and functions. For example, although **inhibitory interneurons** ([Tremblay et al., 2016](https://www.sciencedirect.com/science/article/pii/S0896627316303117)) are fewer than excitatory neurons in the cortex (about 20% vs. 80%), there are at least four major classes of inhibitory neurons with distinct intrinsic and functional properties: Pvalb, Vip, Sst, and Lamp5 (**Figure 1b**).  


<img src="../Figures/neurons_morpho-electric_types_gouwens_et_al_2019.png" width="1200"/>

**Figure 1.** Examples of excitatory (a) and inhibitory (b) neurons in the mouse visual cortex. Top panels: morphological reconstructions of dendrites.  Bottom panels: electrophysiological responses from the same neurons to hyperpolarizing and depolarizing current injection.  Source: [Gouwens et al., 2019](https://pmc.ncbi.nlm.nih.gov/articles/PMC8078853/). 


You can explore the Allen Institute Cell Database using the [interactive website](https://celltypes.brain-map.org/data) or the [Allen Software Development Kit (SDK)](https://allensdk.readthedocs.io/en/latest/).  Through the Allen SDK, you can access all available electrophysiology measurements and morphological reconstructions (see links below).

**Additional resources and reading**
* [Cell Types Database](https://alleninstitute.github.io/AllenSDK/_static/examples/nb/cell_types.html). Example notebooks and documentation.  
* [Open Neuroscience Education](https://sites.google.com/ucsd.edu/neuroedu/cell-types/jupyter-instructions). Excellent teaching resources created by Dr. Ashley Juavinett.  
* [Transgenic mouse lines](https://portal.brain-map.org/explore/toolkit/mice). The Allen Institute for Brain Science used [Cre mouse lines](https://spikesandbursts.wordpress.com/2018/07/18/cre-mice-for-targeting-neurons/) to identify genetically defined cell classes. Driver lines are used to target specific populations, while reporter lines are used to visualize (with fluorescent proteins) or manipulate those populations.
* [Zheng et al., 2022](https://www.sciencedirect.com/science/article/pii/S0092867422007838). What is a cell type and how to define it?

# Import the packages

In [None]:
import os

# Dataframes
import pandas as pd
import numpy as np

# Optional: Display all columns in table
pd.set_option('display.max_columns', None)

# Display all or a specific number of rows
# pd.set_option('display.max_rows', 20)

# Stats libraries
import scipy.stats as stats
from scipy.stats import shapiro, bartlett

# Plotting libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Interactive plots (comment out if needed)
# import ipywidgets as widgets
# from IPython.display import display
# #For Jupyter Lab:
# %matplotlib widget

# Paths

In [None]:
# Change the paths and folder names according to your data structure
notebook_name = 'dataframes_plots'

# Data path to 'Data_example' folders. Change accordingly to your data structure.
data_path = os.path.dirname(os.getcwd())  # Moves one level up from the current directory

# Change the folder names accordingly
paths = {'data': data_path,
         'raw_data':  f'{data_path}/Data_examples/{notebook_name}/',
         'processed_data': f'{data_path}/Processed_data_examples/{notebook_name}/',
         'analysis': f'{data_path}/Analysis_examples/{notebook_name}/',         
         'plots': f'{data_path}/Analysis_examples/{notebook_name}/Plots/'}

# Make folders if they do not exist yet
for path in paths.values():
    os.makedirs(path, exist_ok=True)

# Plot settings

In [None]:
# Matplotlib settings
plt.rcParams.update({
    'font.family': 'Arial',
    'font.size': 18,  # Base size used by most elements
    'axes.labelsize': 18,     # Axis labels
    'axes.titlesize': 18,     # Plot titles
    'xtick.labelsize': 18,    # X axis numbers
    'ytick.labelsize': 18,    # Y axis numbers
    'legend.fontsize': 18,    # Legend text
    'savefig.transparent': True,
    'svg.fonttype': 'none',   # Editable text in SVGs
})

# Seaborn settings
sns.set(style="ticks",  
        context="notebook", 
        font="Arial",
        rc={
            "axes.labelsize": 18,
            "axes.titlesize": 18,
            "xtick.labelsize": 18,
            "ytick.labelsize": 18,
            "legend.fontsize": 18,
        })

# Load the dataframe

* For excel files you can use [`pandas.read_excel`](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html).
* For CSV files, you can use [`pandas.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)


In [None]:
dataset = pd.read_csv(os.path.join(paths['raw_data'], 'cell_types_specimen_details.csv'))  # or pd.read_csv(paths['raw_data'] + '/cell_types_specimen_details.csv')

dataset.head(10)  # Use head or tail to show the first n rows

# Merge dataset

In this example, the data is already in one table, but I give a mock example of how to join tables using [`pandas.merge`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html). You can also combine tables using the index of DataFrames with [`pandas.join`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html).

In [None]:
# Mock example splitting the dataset and joining it again
dataset = pd.read_csv(os.path.join(paths['raw_data'], 'cell_types_specimen_details.csv'))

# Split into two subsets, keeping 'specimen__id' in both
dataset_subset1 = dataset[['specimen__id'] + list(dataset.columns[2:10])]
dataset_subset2 = dataset[['specimen__id'] + list(dataset.columns[11:20])]

# Save subsets to the analysis folder
dataset_subset1.to_csv(os.path.join(paths['analysis'], 'dataset_subset1.csv'), index=False)
dataset_subset2.to_csv(os.path.join(paths['analysis'], 'dataset_subset2.csv'), index=False)

# Join the subsets on the common column 'specimen__id'
dataset_merged = pd.merge(dataset_subset1, dataset_subset2, on='specimen__id')

# Save the merged dataframe
dataset_merged.to_csv(os.path.join(paths['analysis'], 'dataset_merged.csv'), index=False)
dataset_merged.head()

# Transform the table

Once you have merged the dataset, you may still need to transform it to add calculations or new labels. Common transformations include:

- Transpose the dataframe: `dataframe.T`
- Insert a column: `dataframe.insert` or `dataframe.assign`, `insert` adds a new column at a specific position in a copied dataframe, while `assign` creates a new column without changing the original dataframe unless you save it.
- Rename a column: `dataframe.rename`

It is advisable to work on a copy of the table and, when saving, use a different name to avoid overwriting the original dataset.

## Insert a column

In [None]:
# Insert a new column
dataset_processed = dataset.copy()  # Make a copy of the original dataset
dataset_processed.insert(0, 'id', range(1, len(dataset_processed) + 1))
dataset_processed.head(10)

In [None]:
dataset.assign(id=range(1, len(dataset) + 1))
dataset.head(10)

In [None]:
# The original dataset has not been changed
dataset.head()

## Rename columns


In [None]:
# Rename a column
dataset_processed = dataset.copy()  # Make a copy of the original dataset
dataset_processed = dataset.rename(
    columns={"structure__name": "brain_structure"})

dataset_processed.head()

# Explore the dataset

Here are some Pandas functions that are useful for getting an overview of large datasets. You can also type `?` after each function in Python for a longer explanation.

* `dataset.shape`. Shows the number of rows and columns (also displayed at the bottom of the table).
* `dataset.columns`. Lists the names of all columns.
* `dataset.info()`. Displays all columns, the number of non-null values, and the data type of each column.
* `dataset.head()`. Returns the first few rows (5 by default).
* `dataset.tail()`. Returns the last few rows (5 by default).
* `dataset.loc[rows, [columns]]`. Select specific rows and columns by label.
* `dataset.sort_values(by='column_name')`. Sorts the dataframe by a specific column (ascending by default).

In [None]:
# Unique structures in the dataset
dataset.loc[0:5, ['structure__name']]

## Filter the table

Filtering the table is not strictly necessary, since you can select specific rows and columns for analysis. However, it can be convenient for many things. For example, showing only mouse data. You can then use the functions above to check the sample size of the filtered subset or concatenate many arguments to find the group of interest (see examples in plots and statistics). An important note for beginners to avoid the common Python `KeyError`:

* `['column_name']` returns a Pandas Series and can only select one column.
* `[['column_name']]` returns a Pandas DataFrame, which can include one or more columns.

In [None]:
dataset_mice = dataset[(dataset['donor__species'] == 'Mus musculus')]
dataset_mice

In [None]:
dataset_mice_ephys = dataset_mice[[column for column in dataset_mice.columns if column.startswith('ef')]]
# Option B: same results
# dataset_mice_ephys = dataset_mice.loc[:, dataset_mice.columns.str.startswith('ef')]

dataset_mice_ephys.head(10)

## Agreggate the table

Aggregation in Pandas allows you to group data by one or more columns and then compute summary statistics or other calculations for each group, such as counts, means, or sums.

In [None]:
# Group by structure and feature
groupby_parameter = 'structure__acronym'
ephys_feature = 'ef__avg_firing_rate'

grouped_dataset = (
    dataset.groupby(groupby_parameter)
    .agg(num_cells = (ephys_feature, 'size'), 
         av_firing_rate = (ephys_feature, 'mean'))
    .dropna()  # Ignoring missing values
        )

grouped_dataset

## Sort the table

In [None]:
dataset.sort_values(by='ef__avg_firing_rate')

## Summary statistics

The function `describe` give you a quick overview of common descriptive statistics. You can use it on the whole dataframe or for specific features. For example:

```python
dataset.mean(numeric_only=True)             
dataset.mean(numeric_only=True).to_frame() # Converts the result into a DataFrame


In [None]:
# Statistics for neuron subtype

neuron_subtype = dataset_mice[
    (dataset_mice.line_name == 'Pvalb-IRES-Cre') &
    (dataset_mice.structure__acronym == 'VISp5')
]

neuron_subtype.describe()

In [None]:
# Show the number of rows without values (NaN or null)
neuron_subtype.isna().sum().to_frame(name='NaN Count').T

## Unique values in columns

You can quickly check all unique values in a column using `unique()`, such as neuron subtypes, brain areas, treatments, etc. In this example, we look at all mouse lines used for recordings in the primary visual area by the Allen Brain Institute. Since neurons were recorded in different cortical layers, the handy function [`startswith`](https://www.codecademy.com/resources/docs/python/strings/startswith) and [`endswith`](https://www.codecademy.com/resources/docs/python/strings/endswith) are very helpful.

In [None]:
dataset_mice['line_name'][dataset_mice['structure__acronym'].str.startswith('VISp')].value_counts()

# Plots and statistics

[Matplotlib](https://matplotlib.org/) and [Seaborn](https://seaborn.pydata.org/) are two powerful libraries for data visualization in Python that allow you to create high-quality figures. While Python may not be as user-friendly as R for statistics, [SciPy](https://docs.scipy.org/doc/scipy/reference/stats.html) and [statsmodels](https://www.statsmodels.org/stable/index.html) do the job.


## Pandas plots

[Pandas](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html) has built-in plotting functions that are easy to use. They are not as versatile as Matplotlib or Seaborn, but are useful for quickly plot the data.

In [None]:
ephys_feature = 'ef__tau'

feature_means = dataset.groupby('line_name')[ephys_feature].mean()
feature_means.plot(kind='bar', figsize=(12,4))

ephys_feature = 'ef__tau'

## Matplotlib: plot + 3-group stats

[Matplotlib](https://matplotlib.org/) is one of the most commonly used plotting libraries in Python. It allows you to customize axes, colors, labels, etc. You can combine it with [SciPy](https://docs.scipy.org/doc/scipy/reference/stats.html) to perform statistical analysis.

To compare three or more groups, you can use **one-way ANOVA**. This test assumes numeric data, independent samples, normally distributed groups, and homogeneity of variances. If the sample sizes are large enough (normally >25-30), the distribution of the sample means will approximate a Gaussian distribution even if the population itself is not Gaussian (Central Limit Theorem). Normality tests have limited utility in ANOVA for moderate to large samples. If the assumptions of ANOVA are violated, you can use the non-parametric alternative called the [Kruskal-Wallis test](https://www.graphpad.com/guides/prism/latest/statistics/stat_checklist_kw.htm), which does not require normally distributed data.


In [None]:

# Load dataset
dataset = pd.read_csv(os.path.join(paths['raw_data'], 'cell_types_specimen_details.csv'))

# Filter dataset for your region of interest
filtered_dataset = dataset[
    (dataset['donor__species'] == 'Mus musculus') & 
    (dataset['structure__acronym'] == 'VISp2/3')
    # (dataset['structure__acronym'].str.startswith('VISp'))
]

# Define the groups to compare
group_labels = ['Pvalb-IRES-Cre', 'Sst-IRES-Cre', 'Vip-IRES-Cre']

# Feature for comparison
feature = 'ef__avg_firing_rate'

# Extract feature values for each group (drop NaNs)
group_data = [
    filtered_dataset[filtered_dataset['line_name'] == label][feature].dropna()
    for label in group_labels
]

# Boxplot
fig, ax = plt.subplots(figsize=(4,4))
ax.boxplot(group_data, labels=[lbl.split('-')[0] for lbl in group_labels])
ax.set_ylabel("Average firing rate (Hz)")
plt.show()

# Save figure
fig.tight_layout()
plot_path = f"{paths['plots']}VISp_L23_interneurons_{feature}.png"
fig.savefig(plot_path, dpi=300)

# Statistics
group_stats = pd.DataFrame(columns=['n', 'mean', 'std', 'Shapiro_W', 'Shapiro_p'])

# Loop over groups
for i, group in enumerate(group_data):
    group_stats.at[group_labels[i], 'n'] = len(group)
    group_stats.at[group_labels[i], 'mean'] = group.mean()
    group_stats.at[group_labels[i], 'std'] = group.std()
    
    # Shapiro-Wilk test
    W, p = shapiro(group)
    group_stats.at[group_labels[i], 'Shapiro_W'] = W
    group_stats.at[group_labels[i], 'Shapiro_p'] = p

# Equality of variances (Bartlett's test)
bartlett_stat, bartlett_p = bartlett(group_data[0], group_data[1], group_data[2])
p_str = f"{bartlett_p:.2f}"
print(f"Bartlett's test: stat = {bartlett_stat:.3f}, p = {p_str}")

# # ANOVA
# anova_results = stats.f_oneway(*group_data)
# p_str = f"{anova_results.pvalue:.3f}" if anova_results.pvalue >= 0.0001 else "<0.0001"
# print(f"\nOne-way ANOVA: F = {anova_results.statistic:.3f}, p = {p_str}")

# Kruskal-Wallis test (non-parametric)
kruskal_results = stats.kruskal(group_data[0], group_data[1], group_data[2])
p_str = f"{kruskal_results.pvalue:.3f}" if kruskal_results.pvalue >= 0.0001 else "<0.0001"
print(f"\nKruskal-Wallis test: H = {kruskal_results.statistic:.3f}, p = {p_str}")

group_stats

## Seaborn

[Seaborn](https://seaborn.pydata.org/) is built on Matplotlib and offers more flexibility and higher-level plotting functions to create more complex plots with fewer lines. One useful feature is automatic correlation plots (pair plots).
 

### Plot

In [None]:
# Load 
dataset = pd.read_csv(os.path.join(paths['raw_data'], 'cell_types_specimen_details.csv'))

filtered_dataset = (dataset[
    (dataset['donor__species'] == 'Mus musculus') & 
    (dataset['structure__acronym'] == 'VISp2/3')])  # Columns to display between [[]]

x_categories = 'line_name'
y_feature = 'ef__ri'

# Create figure and axes
fig, ax = plt.subplots(figsize=(16, 10))

# Plot the boxplot on the Axes
sns.boxplot(x=x_categories, y=y_feature, data=filtered_dataset, ax=ax)

# Rotate x-axis labels
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha='right')

# Set axis labels
ax.set_xlabel(x_categories)
ax.set_ylabel(y_feature)

# Adjust layout
fig.tight_layout()

# Save figure
plot_path = f"{paths['plots']}VISp_L23_{x_categories}_{y_feature}.png"  # or .svg
fig.savefig(plot_path, dpi=300)

# Show the figure
plt.show()

### Plot + 2-group stats

Assumptions for t-tests: One variable, numeric data, two groups (or one), random sample, and normally distributed.

- GraphPad. [How to test for normality test](https://www.graphpad.com/guides/prism/latest/statistics/stat_how_to_normality_test.htm).
- GraphPad. [Ultimate guide to T-tests](https://www.graphpad.com/guides/the-ultimate-guide-to-t-tests).
- Learning statistics with Python. [Comparing two-means](https://ethanweed.github.io/pythonbook/05.02-ttest.html). 

In [None]:
# Filter dataset 
subtype_dataset = dataset[
    (dataset['donor__species'] == 'Mus musculus') & 
    (dataset['structure__acronym'].str.startswith('VISp')) &
    (dataset['line_name'] == 'Vip-IRES-Cre')]

# Parameters to plot
layer_column = 'structure__acronym'
layer_L23 = 'VISp2/3'
layer_L5 = 'VISp5'

feature_column = 'ef__ri'  # Feature to compare (e.g., input resistance)

# Apply filters to create subsets
L23_subgroup = subtype_dataset[layer_column].eq(layer_L23)
L5_subgroup = subtype_dataset[layer_column].eq(layer_L5)

group_L23 = subtype_dataset[L23_subgroup][feature_column]
group_L5 = subtype_dataset[L5_subgroup][feature_column]

# Plot boxplot
fig, ax = plt.subplots(figsize=(5,5))
ax.boxplot([group_L23, group_L5])
ax.set_xticklabels([layer_L23, layer_L5])
ax.set_ylabel(feature_column)

# Create summary table
summary_table = pd.DataFrame(index=[layer_L23, layer_L5], columns=['count', 'mean', 'std'])
summary_table.loc[layer_L23, 'count'] = group_L23.count()
summary_table.loc[layer_L5, 'count'] = group_L5.count()
summary_table.loc[layer_L23, 'mean'] = group_L23.mean()
summary_table.loc[layer_L5, 'mean'] = group_L5.mean()
summary_table.loc[layer_L23, 'std'] = group_L23.std()
summary_table.loc[layer_L5, 'std'] = group_L5.std()

# Independent T-test
ttest_results = stats.ttest_ind(group_L23, group_L5, nan_policy='omit')

# Save figure
fig.tight_layout()
plot_path = f"{paths['plots']}VISp_L23_5_VIP_{feature_column}.png"  # or .svg
fig.savefig(plot_path, dpi=300)
plt.show()

print("Results of the T-test:", ttest_results)
# Test normality for each group
shapiro_L23 = shapiro(group_L23)
shapiro_L5 = shapiro(group_L5)

print(f"Shapiro-Wilk normality test {layer_L23}: W = {shapiro_L23.statistic:.3f}, p = {shapiro_L23.pvalue:.2f}")
print(f"Shapiro-Wilk normality test {layer_L5}: W = {shapiro_L5.statistic:.3f}, p = {shapiro_L5.pvalue:.2f}")
summary_table

### Pairwise correlations

In [None]:
filtered_dataset = dataset[
    (dataset['donor__species'] == 'Mus musculus') & 
    (dataset['structure__acronym'].str.startswith('VISp')) &
    (dataset['line_name'] == 'Vip-IRES-Cre')]

columns_pairplot = ['ef__f_i_curve_slope',
                   'ef__avg_firing_rate',
                   'ef__tau',
                   'ef__ri']

# Create the pair plot
pair_plot = sns.pairplot(filtered_dataset[columns_pairplot], diag_kind='hist')

# Adjust layout and show
pair_plot.fig.tight_layout()
plt.show()

# Save figure
plot_path = f"{paths['plots']}VISp_L23_5_VIP_pairplot.png"  # or .svg
pair_plot.fig.savefig(plot_path, dpi=300)

### Linear regression

Remember that the **correlation coefficient** measures the strength and direction of a linear relationship between two variables without assuming causality, whereas **regression** assumes a directional relationship and estimates the effect of one variable on another. The Pearson correlation coefficient is commonly denoted as **r**, and its square (**r²**) represents the proportion of variance in the dependent variable that is linearly explained by the independent variable.

In the code below, I plot a **regression line** for two electrophysiological features. Here, **r²** is computed from the slope of the regression line and indicates how well the line explains the variability in the data. The associated **p-value** tests whether the slope is significantly different from zero. 

- GraphPad guide. [The difference between correlation and regression](https://www.graphpad.com/guides/prism/latest/statistics/stat_the_difference_between_correla.htm).
- Handbook of Biological Statistics. [Correlation and linear regression](https://www.biostathandbook.com/linearregression.html).
- Statistical Thinking for the 21st Century. [Linear regression](https://statsthinking21.github.io/statsthinking21-core-site/the-general-linear-model.html#linear-regression). 

In [None]:
# Filter dataset
filtered_dataset = dataset[
    (dataset['donor__species'] == 'Mus musculus') & 
    (dataset['structure__acronym'].str.startswith('VISp')) &
    (dataset['line_name'] == 'Vip-IRES-Cre')
]

# Select features for correlation
feature_x = filtered_dataset['ef__ri']
feature_y = filtered_dataset['ef__vrest']

# Fit linear regression and get statistics
slope, intercept, r_value, p_value, std_err = stats.linregress(feature_x, feature_y)

# Plot scatter
fig, ax = plt.subplots(figsize=(5,5))
ax.scatter(feature_x, feature_y, color='#2ca25f', label='Data points')

# Plot regression line
ax.plot(feature_x, slope*feature_x + intercept, color='black', linewidth=2, label='Regression line')

# Labels
ax.set_xlabel(feature_x.name)
ax.set_ylabel(feature_y.name)

# Annotate regression equation, R², and p-value
ax.text(
    0.05, 0.95,
    f"$r^2 = {r_value**2:.3f}$\n$p = {p_value:.3e}$",
    transform=ax.transAxes,
    verticalalignment='top',
    bbox=dict(facecolor='white', alpha=0.8))


# Save figure
fig.tight_layout()
plot_path = f"{paths['plots']}VISp_L23_5_VIP_ri_vrest.png"
fig.savefig(plot_path, dpi=300)

plt.show()

# Optional: print regression results in the notebook
print(f"Regression equation: y = {slope:.3f} * x + {intercept:.3f}")
print(f"R² = {r_value**2:.3f}")
print(f"P-value of slope: {p_value:.3e}")

# Compute Pearson correlation coefficient
pearson_r = feature_x.corr(feature_y)
print(f"Pearson correlation coefficient (r) = {pearson_r:.3f}")