### Pandas Exercises for Data Analysis

Full exercises you can find <a href="https://www.machinelearningplus.com/python/101-pandas-exercises-python/">here</a>

**1. How to import pandas and check the version?**

In [1]:
import pandas as pd
pd.__version__

'1.3.3'

**2. How to create a series from a list, numpy array and dict?**

In [15]:
import numpy as np

# Input
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

In [16]:
# Solution
ser1 = pd.Series(mylist)
ser2 = pd.Series(myarr)
ser3 = pd.Series(mydict)
print(ser3.head())             # only prints the first 5 entries

a    0
b    1
c    2
e    3
d    4
dtype: int32


**3. How to convert the index of a series into a column of a dataframe?**

In [40]:
# Input
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))
ser = pd.Series(mydict)

In [41]:
# Solution
df = ser.to_frame().reset_index()  # convert series 'ser' into a DataFrame and reset index to default
print(df.head())

  index  0
0     a  0
1     b  1
2     c  2
3     e  3
4     d  4


In [37]:
# .to_frame(name=None)
# Convert Series to DataFrame

s = pd.Series(["a", "b", "c"],
              name="vals")


In [38]:
s.to_frame()

Unnamed: 0,vals
0,a
1,b
2,c


In [None]:
# DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
# Reset the index, or a level of it

# Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex,
# this method can remove one or more levels.

**4. How to combine many series to form a dataframe?**

In [7]:
# Input
ser1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pd.Series(np.arange(26))

In [11]:
# Solution 1
df = pd.concat([ser1, ser2], axis=1)  # using function 'pd.concat([SERIES1, SERIES2,...], axis=1)
print(df.head())

   0  1
0  a  0
1  b  1
2  c  2
3  e  3
4  d  4


In [12]:
# Solution 2
df = pd.DataFrame({'col1': ser1, 'col2': ser2})  # pd.DataFrame({'COLUMNNAME1': SERIES1, 'COLUMNNAME2': SERIES2,...})
print(df.head())

  col1  col2
0    a     0
1    b     1
2    c     2
3    e     3
4    d     4


**5. How to assign name to the series’ index?**

In [13]:
# Input
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))

In [14]:
# Solution
ser.name = 'alphabets'
ser.head()

0    a
1    b
2    c
3    e
4    d
Name: alphabets, dtype: object

**6. How to get the items of series A not present in series B?**

In [17]:
# Input
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

In [57]:
# Solution
ser1[~ser1.isin(ser2)]  # A[~A.isin(B)]

0    1
1    2
2    3
dtype: int64

**7. How to get the items not common to both series A and series B?**

In [50]:
# Input
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

In [59]:
# Solution
ser_u = pd.Series(np.union1d(ser1, ser2))      # union / Return the unique, sorted array of values that are in either of the two input arrays.
print('all items of both without duplicates:','\n',ser_u,'\n')

ser_i = pd.Series(np.intersect1d(ser1, ser2))  # intersect / Return the sorted, unique values that are in both of the input arrays.
print('Duplicated items of both:','\n',ser_i,'\n')

print('items not common to both / :')
ser_u[~ser_u.isin(ser_i)]                      # only shows unique items (without duplicates) of both lists

all items of both without duplicates: 
 0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
dtype: int64 

Duplicated items of both: 
 0    4
1    5
dtype: int64 

items not common to both / :


0    1
1    2
2    3
5    6
6    7
7    8
dtype: int64

**8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?**

In [22]:
# Input
ser = pd.Series(np.random.normal(10, 5, 25))

In [23]:
# Solution
np.percentile(ser, q=[0, 25, 50, 75, 100])

array([ 0.03855972,  4.12948116,  7.97636394, 12.1614982 , 16.871692  ])

**9. How to get frequency counts of unique items of a series?**

In [24]:
# Input
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))

In [25]:
# Solution
ser.value_counts()

b    6
h    6
c    6
e    4
g    3
d    3
f    2
dtype: int64

**10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?**

In [26]:
# Input
np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, [12]))

In [27]:
# Solution
print("Top 2 Freq:", ser.value_counts())
ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
ser

Top 2 Freq: 3    4
2    4
1    3
4    1
dtype: int64


0         3
1         3
2     Other
3         3
4     Other
5         3
6     Other
7         2
8         2
9         2
10    Other
11        2
dtype: object

**11. How to bin a numeric series to 10 groups of equal size?**

In [29]:
# Input
ser = pd.Series(np.random.random(20))

In [30]:
# Solution
pd.qcut(ser, q=[0, .10, .20, .3, .4, .5, .6, .7, .8, .9, 1], 
        labels=['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th']).head()

0    10th
1     8th
2     3rd
3    10th
4     3rd
dtype: category
Categories (10, object): ['1st' < '2nd' < '3rd' < '4th' ... '7th' < '8th' < '9th' < '10th']

**12. How to convert a numpy array to a dataframe of given shape? (L1)**

In [31]:
# Input
ser = pd.Series(np.random.randint(1, 10, 35))

In [32]:
# Solution
df = pd.DataFrame(ser.values.reshape(7,5))
print(df)

   0  1  2  3  4
0  8  1  6  9  8
1  2  2  7  1  3
2  4  6  1  4  7
3  7  6  8  4  1
4  3  4  1  8  4
5  3  6  6  9  8
6  9  9  2  2  1


**13. How to find the positions of numbers that are multiples of 3 from a series?**

In [61]:
# Input
ser = pd.Series(np.random.randint(1, 10, 7))

In [62]:
# Solution
print(ser)

np.argwhere(ser%3==0)

0    6
1    2
2    8
3    8
4    6
5    6
6    5
dtype: int32


ValueError: Length of values (1) does not match length of index (7)