[Reference](https://towardsdatascience.com/the-6-top-pandas-mistakes-for-python-data-scientists-f551156c5c93)

# 1. Not specifying the data type
```python
headers = ['col1', 'col2']
dtypes = {'col1': 'str', 'col2': 'float'}
parse_dates = ['col1', 'col2']pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)
```

# 2. Leaving Extra DataFrame
```python
import pandas as pd
df1 = pd.read_csv(‘file.csv’)
df2 = df1.dropna()
df3 = df1.groupby(‘item’)
```

# 3 : Generating getter and setter values slower
```python
# Using .at 
for i in range(df_size):
    df.at[i] = profile
# Wall time: 22.3 s

# Using .iloc 
for i in range(df_size):
    df.iloc[i] = profile
# Wall time: 19.1 s

# Using .iat, doesn't work for replacing multiple columns of data.
# Fast but isn't comparable since I'm only replacing one column.
for i in range(df_size):
    df.iloc[i].iat[0] = profile['address']
# Wall time: 3.46 s

# Using .to_numpy()
for i in range(df_size):
    df.to_numpy()[i] = profile
# Wall time: 254 ms
```

# 4. Manually configuring Matplotlib
```python
import matplotlib.pyplot as plt
ax.hist(x=df['x'])
ax.set_xlabel('label for column X')
plt.show()

df['x'].plot()
```

# 5. Less CPU Utilisation
```python
import modin.pandas as pd
import numpy as np
frame_data = np.random.randint(0, 100, size=(2**10, 2**8))
df = pd.DataFrame(frame_data)
```

# 6 : Series and DataFrame confusion

In [6]:
import numpy as np
import pandas as pd

data = np.array(['C','O','E','U','S'])
ser = pd.Series(data)
print("The Series created from Array is: ", '\n', ser)

# DataFrame Example:
lst = ['COEUS', 'Enterprises', 'is', 'for', 'the', 'Best', 'Online', 'Courses']

# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print("The DataFrame is: ", '\n',df)

The Series created from Array is:  
 0    C
1    O
2    E
3    U
4    S
dtype: object
The DataFrame is:  
              0
0        COEUS
1  Enterprises
2           is
3          for
4          the
5         Best
6       Online
7      Courses
