**1. How to import pandas and check the version?**

In [1]:
import pandas as pd
print(pd.__version__)

2.0.3


**2. How to create a series from a list, numpy array and dict?**

Create a pandas series from each of the items below: a list, numpy and a dictionary

*Input:*

In [2]:
import numpy as np
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))

In [3]:
s1 = pd.Series(mylist)
s2 = pd.Series(myarr)
s3 = pd.Series(mydict)

s3.head()

a    0
b    1
c    2
e    3
d    4
dtype: int64

**3. How to convert the index of a series into a column of a dataframe?**

Convert the series ser into a dataframe with its index as another column on the dataframe.

*Input:*

In [4]:
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))
ser = pd.Series(mydict)

In [5]:
df = ser.to_frame().reset_index()
df.head()

Unnamed: 0,index,0
0,a,0
1,b,1
2,c,2
3,e,3
4,d,4


**4. How to combine many series to form a dataframe?**

Combine ser1 and ser2 to form a dataframe.

*Input:*

In [6]:
import numpy as np
ser1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pd.Series(np.arange(26))

In [7]:
df1 = pd.concat([ser1, ser2], axis=1)
print(df1.head())

df2 = pd.DataFrame({'first': ser1, 'second': ser2})
print(df2.head())

   0  1
0  a  0
1  b  1
2  c  2
3  e  3
4  d  4
  first  second
0     a       0
1     b       1
2     c       2
3     e       3
4     d       4


**5. How to assign name to the series’ index?**

Give a name to the series ser calling it ‘alphabets’.

*Input:*

In [8]:
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))

In [9]:
ser.name = 'alphabets'
ser.head()

0    a
1    b
2    c
3    e
4    d
Name: alphabets, dtype: object

**6. How to get the items of series A not present in series B?**

From ser1 remove items present in ser2

*Input:*

In [10]:
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

In [11]:
ser1[~ser1.isin(ser2)]

0    1
1    2
2    3
dtype: int64

**7. How to get the items not common to both series A and series B?**

Get all items of ser1 and ser2 not common to both.

*Input:*

In [12]:
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

In [14]:
s_union = pd.Series(np.union1d(ser1, ser2))
s_intersect = pd.Series(np.intersect1d(ser1, ser2))
s_res = s_union[~s_union.isin(s_intersect)]
s_res.head(10)

0    1
1    2
2    3
5    6
6    7
7    8
dtype: int64

**8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?**

Compute the minimum, 25th percentile, median, 75th, and maximum of ser.

*Input:*

In [30]:
ser = pd.Series(np.random.normal(10, 5, 25))

In [31]:
print(ser)
np.percentile(ser, q=[0, 25, 50, 75, 100])

0     15.721083
1      5.445931
2      2.565557
3     15.577697
4      0.172290
5      6.615373
6      9.579899
7      9.931824
8      5.816030
9      7.583420
10    15.384337
11     6.207524
12     9.480865
13     7.292446
14     3.026204
15     8.892117
16     9.243202
17    12.466858
18     6.627416
19     9.657901
20    11.169798
21    -6.971233
22    10.262689
23    13.717888
24    14.949341
dtype: float64


array([-6.97123321,  6.20752423,  9.24320237, 11.16979775, 15.72108293])

**9. How to get frequency counts of unique items of a series?**

Calculte the frequency counts of each unique value ser.

*Input:*

In [32]:
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))

In [33]:
print(ser)
ser.value_counts()

0     d
1     c
2     h
3     g
4     h
5     f
6     e
7     b
8     d
9     e
10    b
11    c
12    c
13    c
14    c
15    h
16    d
17    h
18    a
19    g
20    d
21    d
22    d
23    g
24    f
25    c
26    d
27    g
28    a
29    e
dtype: object


d    7
c    6
h    4
g    4
e    3
f    2
b    2
a    2
Name: count, dtype: int64

**10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?**

From ser, keep the top 2 most frequent items as it is and replace everything else as ‘Other’.

*Input:*

In [40]:
np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, [12]))

In [41]:
print(ser)
print(ser.value_counts())
ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
print(ser)
ser.value_counts()

0     2
1     2
2     1
3     3
4     3
5     1
6     1
7     4
8     4
9     4
10    3
11    1
dtype: int64
1    4
3    3
4    3
2    2
Name: count, dtype: int64
0     Other
1     Other
2         1
3         3
4         3
5         1
6         1
7     Other
8     Other
9     Other
10        3
11        1
dtype: object


Other    5
1        4
3        3
Name: count, dtype: int64

**11. How to bin a numeric series to 10 groups of equal size?**

Bin the series ser into 10 equal deciles and replace the values with the bin name.

*Input:*

In [42]:
ser = pd.Series(np.random.random(20))

*Desired Output:*

In [None]:
# First 5 items
0    7th
1    9th
2    7th
3    3rd
4    8th
dtype: category
Categories (10, object): [1st < 2nd < 3rd < 4th ... 7th < 8th < 9th < 10th]

In [45]:
print(ser)
pd.qcut(ser, q=[0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1], labels=['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th'])

0     0.045803
1     0.057589
2     0.182752
3     0.844483
4     0.228431
5     0.667206
6     0.029774
7     0.249181
8     0.861967
9     0.813081
10    0.339266
11    0.798056
12    0.206013
13    0.268454
14    0.536524
15    0.844789
16    0.685825
17    0.972061
18    0.270653
19    0.559700
dtype: float64


0      1st
1      2nd
2      2nd
3      9th
4      3rd
5      7th
6      1st
7      4th
8     10th
9      8th
10     5th
11     8th
12     3rd
13     4th
14     6th
15     9th
16     7th
17    10th
18     5th
19     6th
dtype: category
Categories (10, object): ['1st' < '2nd' < '3rd' < '4th' ... '7th' < '8th' < '9th' < '10th']

**12. How to convert a numpy array to a dataframe of given shape?**

Reshape the series ser into a dataframe with 7 rows and 5 columns

*Input:*

In [46]:
ser = pd.Series(np.random.randint(1, 10, 35))

In [50]:
print(ser)
df = pd.DataFrame(ser.values.reshape(7,5))
df

0     3
1     7
2     2
3     7
4     3
5     9
6     2
7     6
8     6
9     4
10    5
11    2
12    6
13    5
14    5
15    5
16    6
17    2
18    9
19    8
20    4
21    2
22    6
23    5
24    9
25    6
26    4
27    5
28    5
29    4
30    2
31    7
32    1
33    1
34    1
dtype: int64


Unnamed: 0,0,1,2,3,4
0,3,7,2,7,3
1,9,2,6,6,4
2,5,2,6,5,5
3,5,6,2,9,8
4,4,2,6,5,9
5,6,4,5,5,4
6,2,7,1,1,1


**13. How to find the positions of numbers that are multiples of 3 from a series?**

Find the positions of numbers that are multiples of 3 from ser.

*Input:*

In [53]:
ser = pd.Series(np.random.randint(1, 10, 7))

In [54]:
print(ser)
np.argwhere(ser % 3 == 0)

0    6
1    9
2    8
3    2
4    9
5    9
6    1
dtype: int64


array([[0],
       [1],
       [4],
       [5]])

**14. How to extract items at given positions from a series**

From ser, extract the items at positions in list pos.

*Input:*

In [55]:
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]

In [59]:
ser.take(pos)

0     a
4     e
8     i
14    o
20    u
dtype: object

**15. How to stack two series vertically and horizontally ?**

Stack ser1 and ser2 vertically and horizontally (to form a dataframe).

*Input:*

In [60]:
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))

In [62]:
# Vertical
# print(ser1.append(ser2)) -> For older version of pandas
df_v = pd.concat([ser1, ser2], axis=0)
print(df_v)

# Horizontal
df_h = pd.concat([ser1, ser2], axis=1)
print(df_h)

0    0
1    1
2    2
3    3
4    4
0    a
1    b
2    c
3    d
4    e
dtype: object
   0  1
0  0  a
1  1  b
2  2  c
3  3  d
4  4  e


**16. How to get the positions of items of series A in another series B?**

Get the positions of items of ser2 in ser1 as a list.

*Input:*

In [63]:
ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])

In [68]:
print(ser1[ser1.isin(ser2)].index)

# OR
print([np.where(i == ser1)[0].tolist()[0] for i in ser2])

# OR
print([pd.Index(ser1).get_loc(i) for i in ser2])

Index([0, 4, 5, 8], dtype='int64')
[5, 4, 0, 8]
[5, 4, 0, 8]


**17. How to compute the mean squared error on a truth and predicted series?**

Compute the mean squared error of truth and pred series.

*Input:*

In [69]:
truth = pd.Series(range(10))
pred = pd.Series(range(10)) + np.random.random(10)

In [73]:
mse = np.mean((pred - truth) ** 2)
print(truth)
print(pred)
print(mse)

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64
0    0.436980
1    1.833744
2    2.830080
3    3.555479
4    4.325862
5    5.369324
6    6.835949
7    7.036872
8    8.079413
9    9.896653
dtype: float64
0.36367203666938164


**18. How to convert the first character of each element in a series to uppercase?**

Change the first character of each word to upper case in each word of ser.

*Input:*

In [74]:
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

In [78]:
ser1 = pd.Series([i.title() for i in ser])

# OR
ser2 = ser.map(lambda x: x.title())

# OR
ser3 = ser.map(lambda x: x[0].upper() + x[1:])

print(ser1)
print(ser2)
print(ser3)

0     How
1      To
2    Kick
3    Ass?
dtype: object
0     How
1      To
2    Kick
3    Ass?
dtype: object
0     How
1      To
2    Kick
3    Ass?
dtype: object


**19. How to calculate the number of characters in each word in a series?**

*Input:*

In [79]:
ser = pd.Series(['how', 'to', 'kick', 'ass?'])

In [81]:
ser.map(lambda x: len(x))

0    3
1    2
2    4
3    4
dtype: int64

**20. How to compute difference of differences between consequtive numbers of a series?**

Difference of differences between the consequtive numbers of ser.

*Input:*

In [82]:
ser = pd.Series([1, 3, 6, 10, 15, 21, 27, 35])

*Desired Output:*

In [None]:
[nan, 2.0, 3.0, 4.0, 5.0, 6.0, 6.0, 8.0]
[nan, nan, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0]

In [86]:
print(ser.diff().tolist())
print(ser.diff().diff().tolist())

[nan, 2.0, 3.0, 4.0, 5.0, 6.0, 6.0, 8.0]
[nan, nan, 1.0, 1.0, 1.0, 1.0, 0.0, 2.0]
