# Python Data Analysis Cheat Sheet

## Common Libraries to Import

```py
# Core Libraries
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Statistical Analysis
from scipy import stats
from scipy.stats import iqr, ttest_ind, pearsonr

# Machine Learning & Modeling
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, accuracy_score

# Data Preprocessing
from sklearn.preprocessing import StandardScaler, LabelEncoder
```

## Data Exploration

```py
df.head()            # First 5 rows
df.info()            # Data types & non-null counts
df.describe()        # Summary statistics
df.columns           # Column names
df.shape             # Rows and columns
df.isnull().sum()    # Missing values
```

## Data Cleaning

```py
df.dropna()                         # Drop missing values
df.fillna(value)                    # Fill missing values
df.duplicated().sum()               # Count duplicates
df.drop_duplicates(inplace=True)    # Remove duplicates

```

## Visualization

```py
sns.histplot(df['column'])              # Histogram
sns.boxplot(x='column', data=df)        # Boxplot
sns.scatterplot(x='x', y='y', data=df)  # Scatter plot
sns.heatmap(df.corr(), annot=True)      # Correlation heatmap

```

## Statistical Analysis

```py
iqr(df['column'])                   # Interquartile range
stats.ttest_ind(a, b)               # T-test
stats.pearsonr(x, y)                # Pearson correlation

```

## Machine Learning Basics

```py
X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
```

## Preprocessing

```py
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

encoder = LabelEncoder()
df['encoded'] = encoder.fit_transform(df['categorical_column'])
```