# Data Analysis, Statistics, and Plotting Techniques in Python

This notebook demonstrates foundational data analysis, statistics, and plotting techniques using pandas, numpy, and matplotlib in Python. Each step includes a code example and a brief explanation.

## 1. Importing Libraries
We'll need pandas for data manipulation, numpy for statistics, and matplotlib for plotting.

In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## 2. Data Importing
Read a CSV file from a URL, specifying delimiters and comment characters as needed.

In [ ]:
url = 'https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt'
df = pd.read_csv(url, sep='\s+', comment='#', header=None)
df.head()

## 3. Assigning Column Names
If your data lacks headers, assign them manually for clarity.

In [ ]:
df.columns = ['year', 'month', 'decimal_year', 'average', 'deseasonalised', 'no_days', 'std_of_days', 'unc_of_mean']
df.head()

## 4. Viewing and Exploring Data
Use `.head()`, `.columns`, and `.describe()` to inspect your dataset.

In [ ]:
print("Columns:", df.columns.tolist())
df.describe()

## 5. Accessing Data (Indexing)
- Dot notation: `df.average`
- Bracket notation: `df['average']`
- `.loc` for rows and columns: `df.loc[4, 'decimal_year']`

In [ ]:
# Dot notation
df.average.head()

In [ ]:
# Bracket notation
df['average'].head()

In [ ]:
# .loc for specific row and column
df.loc[4, 'decimal_year']

## 6. Boolean Indexing
Filter rows based on logical conditions, e.g., all data from 1960.

In [ ]:
df_1960 = df[df.year == 1960]
df_1960.head()

## 7. Query Method
An alternative to boolean indexing using string expressions.

In [ ]:
df_1961 = df.query('year == 1961')
df_1961.head()

## 8. Descriptive Statistics
Quickly summarize numeric columns.

In [ ]:
mean_co2 = df['average'].mean()
std_co2 = df['average'].std()
min_co2 = df['average'].min()
max_co2 = df['average'].max()
print(f"CO₂ mean={mean_co2:.2f}, std={std_co2:.2f}, min={min_co2}, max={max_co2}")

## 9. Plotting: Line Plot
Show the trend of average CO₂ over time.

In [ ]:
plt.figure(figsize=(10,5))
plt.plot(df['decimal_year'], df['average'], label='Monthly Mean CO₂')
plt.xlabel('Year')
plt.ylabel('CO₂ (ppm)')
plt.title('Mauna Loa Monthly Mean CO₂')
plt.legend()
plt.show()

## 10. Plotting: Scatter Plot
Visualize the relationship between two variables.

In [ ]:
plt.figure(figsize=(8,5))
plt.scatter(df['decimal_year'], df['average'], alpha=0.5)
plt.xlabel('Year')
plt.ylabel('CO₂ (ppm)')
plt.title('Scatter Plot of CO₂ Data')
plt.show()

## 11. Subplots
Compare deseasonalised and average CO₂ levels.

In [ ]:
fig, axs = plt.subplots(2, 1, figsize=(10,8), sharex=True)
axs[0].plot(df['decimal_year'], df['average'], label='Average')
axs[0].set_ylabel('Average CO₂')
axs[0].legend()
axs[1].plot(df['decimal_year'], df['deseasonalised'], color='orange', label='Deseasonalised')
axs[1].set_xlabel('Year')
axs[1].set_ylabel('Deseasonalised CO₂')
axs[1].legend()
plt.tight_layout()
plt.show()

## 12. Trend Analysis / Regression
Fit a linear regression line to quantify the CO₂ trend over time.

In [ ]:
# Linear fit: y = m*x + c
x = df['decimal_year']
y = df['average']
slope, intercept = np.polyfit(x, y, 1)
print(f"Slope: {slope:.4f} ppm/year, Intercept: {intercept:.2f}")

# Plot with regression line
plt.figure(figsize=(10,5))
plt.scatter(x, y, s=10, alpha=0.5, label='Data')
plt.plot(x, slope*x + intercept, color='red', label='Trend Line')
plt.xlabel('Year')
plt.ylabel('CO₂ (ppm)')
plt.title('CO₂ Trend with Linear Regression')
plt.legend()
plt.show()

# Summary
- **Data Import, Cleaning, and Exploration**: Use pandas for robust data handling.
- **Indexing and Selection**: Dot/bracket notation and `.loc` for flexibility.
- **Statistics**: Use pandas/numpy for descriptive and regression analysis.
- **Plotting**: Matplotlib for trends, relationships, and comparisons.

Use these techniques as the foundation for more advanced data analysis and visualization in Python.