### Python's applymap()
Dataframe.applymap() method applies a function that accepts and returns a scalar to every element of a DataFrame.

Sounds interesting, let's see how this works.

In [1]:
import pandas as pd 

nba = pd.read_csv("nba.csv") 
nba.head(10)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
8,Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
9,Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0


In [2]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      457 non-null    object 
 1   Team      457 non-null    object 
 2   Number    457 non-null    float64
 3   Position  457 non-null    object 
 4   Age       457 non-null    float64
 5   Height    457 non-null    object 
 6   Weight    457 non-null    float64
 7   College   373 non-null    object 
 8   Salary    446 non-null    float64
dtypes: float64(4), object(5)
memory usage: 32.3+ KB


In [3]:
# count the null in each column
null_columns=nba.columns[nba.isnull().any()]
nba[null_columns].isnull().sum()

Name         1
Team         1
Number       1
Position     1
Age          1
Height       1
Weight       1
College     85
Salary      12
dtype: int64

What do we know about out dataset at this point?
   * we have missing values in all of the columns of the dataset
   * the missing values are actually null (NaN) - no need to convert anything
   * it's interesting that the first seven columns are missing one value, do we have an empty row?

Let's look to see if we do have a completely empty column. We can do that by displaying the row for the the null in 'Name'.

In [4]:
print(nba[nba['Name'].isnull()][null_columns])

    Name Team  Number Position  Age Height  Weight College  Salary
457  NaN  NaN     NaN      NaN  NaN    NaN     NaN     NaN     NaN


Yep! the last row in the dataset is completely empty.  We will remove this row.

In [5]:
# axis=0 -> row, how='all' -> all values are nan
nba.dropna(axis=0,how='all',inplace=True)
nba.shape

(457, 9)

In [6]:
# re-counting the null in each column
null_columns=nba.columns[nba.isnull().any()]
nba[null_columns].isnull().sum()

College    84
Salary     11
dtype: int64

Now we can start to play with applymap().

Let's say we wanted to: <br>
   1) convert all elements in our dataframe to a str() <br>
   2) then replace that element with the len() of that element <br>

Here's an example of what we are trying to do:<br>
<img align="left" style="padding-right:10px;" src="sample_convert.png" width=500><br>

`applymap()` makes this very simple task!

In [7]:
nba = nba.applymap(lambda x: len(str(x)))

Not too bad!  Also notice that all the NaN from above were replaced with a 3.  How come?  

The NaN was first convert to str(), mean NaN -> 'NaN', and then the len('NaN')  is 3.

In [8]:
nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,13,14,3,2,4,3,5,5,9
1,11,14,4,2,4,3,5,9,9
2,12,14,4,2,4,3,5,17,3
3,11,14,4,2,4,3,5,13,9
4,13,14,3,2,4,4,5,3,9


In [9]:
# count the null in each column
null_columns=nba.columns[nba.isnull().any()]
nba[null_columns].isnull().sum()

Series([], dtype: float64)

In [10]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 457 entries, 0 to 456
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   Name      457 non-null    int64
 1   Team      457 non-null    int64
 2   Number    457 non-null    int64
 3   Position  457 non-null    int64
 4   Age       457 non-null    int64
 5   Height    457 non-null    int64
 6   Weight    457 non-null    int64
 7   College   457 non-null    int64
 8   Salary    457 non-null    int64
dtypes: int64(9)
memory usage: 35.7 KB


#### Now it's your turn to try this out!
Use the university_town.txt to perform the following exercise.

The data file appears to be a list that contains states, region and a university name.<br>
Format: <br>
    state_name\[edit\] <br>
    region_name (University)\[numeric\] <br>
    . <br>
    . <br>
    . <br>
    region_name (University)\[numeric\]

#### Exercise:
Your challenge is to do something similar to the above exercise. Here are some hints to help you out: <br>
   * parse the file by lines and load into an array
   * load the file_array into a pandas dataframe
   * create a UDF (user defined function) that will clean the data
   * use applymap() in conjunction with your UDF to produce a dataframe that looks like the following

<img align="left" style="padding-right:10px;" src="output_df.png" width=200><br>

In [4]:
txtfile = open('university_town.txt')

txtfile.readline()
uni = []
