# Code Snippets

Little code samples

In [30]:
import pandas as pd
from numpy import NaN

### Add a Column to DataFrame

Add a column of to a DataFrame object. We are going to use pd.concat() function to achieve the result. To make it work, we need to:

1. Create a dataframe of a single column, with the column name specified by the user;
2. Align the index of the new dataframe with the dataframe we are appending to;
3. Call concat().

#### Add a List of Fixed Values As a Column

In [16]:
from itertools import repeat

"""
    Add a column of fixed value to a dataframe
    
    [String] column (column name), [Object] value (any value type that can be put into a dataframe), [DataFrame] df
        => [DataFrame] df (new DataFrame with a column appended)
"""
addColumn = lambda column, value, df: \
    pd.concat( [ df
               , pd.DataFrame(repeat([value], len(df)), columns=[column], index=df.index)
               ]
             , axis=1
             )

Try it out

In [17]:
df = pd.DataFrame( [[88, 3], [90, 1], [75, 10]]
                 , columns=['Score', 'Rank']
                 , index=['Isaac', 'Yan', 'Chuyi'])
df

Unnamed: 0,Score,Rank
Isaac,88,3
Yan,90,1
Chuyi,75,10


In [18]:
addColumn('Date', '2020-05-16', df)

Unnamed: 0,Score,Rank,Date
Isaac,88,3,2020-05-16
Yan,90,1,2020-05-16
Chuyi,75,10,2020-05-16


#### Add a Series Object As a Column

In [19]:
"""
    Add a Series Object as a column to a dataframe
    
    The index of the series object is not important but its length must be the same as the length 
    of the dataframe to be appended, otherwise an exception will be thrown.
    
    [String] column (column name), [Series] s, [DataFrame] df
        => [DataFrame] df (new DataFrame with series s as a column appended)
"""
addSeriesColumn = lambda column, s, df: \
    pd.concat( [ df
               , pd.DataFrame({column: s.values}, index=df.index)
               ]
             , axis=1
             )

In [20]:
dates = pd.Series(['2020-05-16', '2020-05-17', '2020-05-18'])
dates

0    2020-05-16
1    2020-05-17
2    2020-05-18
dtype: object

In [21]:
addSeriesColumn('Date', dates, df)

Unnamed: 0,Score,Rank,Date
Isaac,88,3,2020-05-16
Yan,90,1,2020-05-17
Chuyi,75,10,2020-05-18


In [22]:
# Will go wrong! Because the length of the dataframe is not equal to the length of the series
# addSeriesColumn( 'Date'
#                , pd.Series(['2020-05-16', '2020-05-17'])
#                , df)

#### Create a New Series Then Add

Sometimes we create a series from a dataframe and add it as a new column. This way, we don't have to worry about the index of the series. So two steps:

1. Create a new series;
2. Combine the new series to the dataframe.

In [23]:
df

Unnamed: 0,Score,Rank
Isaac,88,3
Yan,90,1
Chuyi,75,10


In [26]:
# create a new column
comments = df['Rank'].apply(lambda x: 'Excellent' if x < 6 else 'Good')
comments

Isaac    Excellent
Yan      Excellent
Chuyi         Good
Name: Rank, dtype: object

In [27]:
# combine the new column
pd.concat([df, pd.DataFrame({'Comments': comments})], axis=1)

Unnamed: 0,Score,Rank,Comments
Isaac,88,3,Excellent
Yan,90,1,Excellent
Chuyi,75,10,Good


## Count NaN Values

In [34]:
countNan = lambda v: v.isnull().apply(lambda x: 1 if x else 0).sum()

Let's count number of NaN values in a vector

In [35]:
s = pd.Series([1, 2, NaN, 0])
s

0    1.0
1    2.0
2    NaN
3    0.0
dtype: float64

In [36]:
countNan(s)

1

In [38]:
df = pd.DataFrame([[1, NaN], [2, 3], [NaN, NaN]], columns=['value1', 'value2'])
df

Unnamed: 0,value1,value2
0,1.0,
1,2.0,3.0
2,,


In [39]:
df.apply(countNan)

value1    1
value2    2
dtype: int64

### Count NaN Percentage

In [43]:
countNanPercent = lambda v: countNan(v)/v.size

In [41]:
countNanPercent(s)

0.25

In [42]:
df.apply(countNanPercent)

value1    0.333333
value2    0.666667
dtype: float64