## 9 Must-Know Python-Pandas Operations for Working with Data

**Data Import**
* `pd.read_csv('file.csv')`
* `pd.read_excel('file.xlsx', sheet_name='Sheet1')`
* `pd.read_sql(query, connection)`
* `pd.read_json('file.json')`
* `pd.read_parquet('file.parquet')`

**Data Manipulation**
* `df.groupby('col').agg({'col2': ['mean', 'sum']})`
* `df.merge(df2, on='key', how='left')`
* `df.pivot_table(values='val', index='idx', columns='col')`
* `df.sort_values(['col1', 'col2'])`
* `df.melt(id_vars=['id'], value_vars=['A', 'B'])`
* `df.apply(lambda x: x**2)` # Apply function

**Data Cleaning**
* `df.dropna(subset=['col'], how='any')`
* `df.fillna(method='ffill')` # Forward fill
* `df.drop_duplicates(subset=['col'])`
* `df.replace({'old': 'new'})`
* `df['col'].astype('category')`
* `df.interpolate(method='linear')`

**String Operations**
* `df['col'].str.contains('pattern')`
* `df['col'].str.extract('([A-Za-z]+)')`
* `df['col'].str.split('_').str[0]`
* `df['col'].str.lower()`
* `df['col'].str.strip()`
* `df['col'].str.replace(r'\s+', '_', regex=True)`

**Data Export**
* `df.to_csv('output.csv', index=False)`
* `df.to_excel('output.xlsx', sheet_name='Sheet1')`
* `df.to_parquet('output.parquet')`
* `df.to_json('output.json', orient='records')`

**Data Selection**
* `df['column']` # Single column
* `df.loc['row', 'col']` # Label based
* `df.iloc[0:5, 0:2]` # Integer based
* `df.query('col > 5')` # SQL-like filtering
* `df[df['col'].isin(['A', 'B'])]` # Multiple values

**Statistics**
* `df.describe()` # Summary statistics
* `df['col'].agg(['mean', 'median', 'std'])`
* `df['col'].value_counts(normalize=True)`
* `df.corr(method='pearson')`
* `df.cov()` # Covariance matrix
* `df.quantile([0.25, 0.5, 0.75])`

**Time Series**
* `df.resample('M').mean()` # Monthly average
* `df.rolling(window=7).mean()`
* `df.shift(periods=1)` # Shift values
* `pd.date_range('2024', periods=12, freq='M')`
* `df.asfreq('D', method='ffill')`
* `df['date'].dt.strftime('%Y-%m-%d')`

**Advanced Features**
* `df.pipe(func)` # Method chaining
* `pd.eval('df1 + df2')` # Expression eval
* `df.memory_usage(deep=True)`
* `df.select_dtypes(include=['number'])`
* `df.nlargest(5, 'col')` # Top N values
* `df.explode('col')` # Expand list column


**Tips & Best Practices**
* Use `.copy()` when creating DataFrame views
* Chain operations with method chaining
* Set `dtype='category'` for categorical columns
* Use `inplace=True` carefully, prefer reassignment