# Pandas Cheat Sheet – Quick Reference (2025–2026)

**Last updated:** February 2026

Pandas builds on NumPy to provide high-level, labeled data structures for data analysis.

---

## 1. Import




In [2]:
import pandas as pd
import numpy as np




---

## 2. Core Data Structures

* **Series:** 1D labeled array
* **DataFrame:** 2D tabular data

---

## 3. Series Creation


In [4]:
pd.Series([1,2,3], index=['a','b','c'], name='data')

a    1
b    2
c    3
Name: data, dtype: int64

In [5]:

pd.Series({'a':1, 'b':2})

a    1
b    2
dtype: int64

In [6]:
pd.Series(np.array([1,2,3]))


0    1
1    2
2    3
dtype: int64

In [7]:

pd.Series(5, index=['a','b','c'])

a    5
b    5
c    5
dtype: int64


Attributes: `values`, `index`, `name`, `dtype`

---

## 4. DataFrame Creation


In [10]:
pd.DataFrame({'A':[1,2], 'B':[3,4]}, index=['row1','row2'])


Unnamed: 0,A,B
row1,1,3
row2,2,4


In [8]:

pd.DataFrame(np.random.rand(3,2), columns=['col1','col2'])


Unnamed: 0,col1,col2
0,0.608604,0.595744
1,0.75984,0.123125
2,0.110688,0.498186


In [9]:

pd.DataFrame([{'A':1,'B':3}, {'A':2,'B':4}])

Unnamed: 0,A,B
0,1,3
1,2,4



From structured NumPy arrays:


In [11]:
data = np.array([
    (1,'Alice',3.8), (2,'Bob',4.1)
], dtype=[('id','i4'),('name','U10'),('gpa','f4')])

pd.DataFrame.from_records(data)

Unnamed: 0,id,name,gpa
0,1,Alice,3.8
1,2,Bob,4.1



---

## 5. Vectorized Operations

```python
df['C'] = df['A'] + df['B']
df['A'] > 1

df.sum()
df.mean()
df.describe()
```

---

## 6. Apply & String Ops

```python
df['A'].apply(lambda x: x**2)
df.apply(np.sum, axis=1)

df['name'].str.lower()
df['name'].str.startswith('A')
```

---

## 7. Indexing & Selection

```python
df['A']
df[['A','B']]

df.loc['row1','A']
df.iloc[0,1]

df.at[0,'A']
df.iat[0,1]
```

Boolean & query:

```python
df.loc[df['A']>1]
df.query('A > 1 and B < 5')
```

---

## 8. Sorting, Grouping, Reshaping

```python
df.sort_values('A', ascending=False)
df.sort_index()

df.groupby('category').mean()
df.groupby('category').agg({'A':'sum','B':'mean'})

df.pivot_table(values='value', index='row', columns='col', aggfunc='mean')
df.melt(id_vars=['id'], value_vars=['A','B'])

df.stack()
df.unstack()
```

---

## 9. Merging & Concatenation

```python
pd.merge(df1, df2, on='key', how='left')
df1.join(df2, on='key')

pd.concat([df1, df2], axis=0)
pd.concat([df1, df2], axis=1)
```

---

## 10. Data Types & Missing Data

```python
df['col'] = df['col'].astype('float32')
df['cat'] = df['cat'].astype('category')

df.fillna(0)
df.dropna()
df.isna()
```

---

## 11. File I/O

```python
pd.read_csv('data.csv', parse_dates=['date'], index_col=0)
pd.read_excel('data.xlsx')
pd.read_json('data.json')
pd.read_sql('SELECT * FROM table', conn)

df.to_csv('out.csv', index=False)
df.to_excel('out.xlsx')
df.to_json('out.json', orient='records')
```

---

## 12. NumPy Integration

```python
arr = np.random.rand(3,2)
df = pd.DataFrame(arr, columns=['X','Y'])

np_arr = df.values
np_arr += 1

df['Z'] = np.sqrt(df['X'])
```

---

## Performance Tips & Gotchas

* Vectorized ops > apply > loops
* Prefer `loc` / `iloc`
* Use `category` for low-cardinality strings
* Beware `SettingWithCopyWarning`
* Pandas aligns on index labels automatically