# More Data Wrangling 6

In this notebook I demonstrate how to add labels to a categorical variable that may have been recorded in the dataset using a numerical value, dummy code, or other string object. To do this I will use the map method.  

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('fearofcrime.csv')

df.head()

Unnamed: 0,sex,anxlevel,stress,totalworry,construct
0,2,2,1.3,3.0375,3.04878048780488
1,2,2,2.1,3.21875,2.95121951219512
2,1,3,1.95,2.025,3.29268292682927
3,2,2,2.1,1.80625,2.19512195121951
4,2,2,2.05,2.5625,2.80487804878049


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 235 entries, 0 to 234
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   sex         235 non-null    object
 1   anxlevel    235 non-null    object
 2   stress      235 non-null    object
 3   totalworry  235 non-null    object
 4   construct   235 non-null    object
dtypes: object(5)
memory usage: 9.3+ KB


In [4]:
# Checking the number of unique values recorded for the anxlevel variable. 

df['anxlevel'].unique()

array(['2', '3', '1', ' '], dtype=object)

In [5]:
# Checking unique values for sex. As with anxlevel this raises some issues. 
df['sex'].unique()

array(['2', '1', ' '], dtype=object)

In [6]:
# Creating a new dataframe object that only contains anxlevel and sex values that are not ' ' (an empty string). 
df_2 = df.loc[(df['anxlevel'] != ' ') & (df['sex'] != ' ')]

In [7]:
# Checking that the new data frame has fewer rows. 
df_2.shape

(227, 5)

In [8]:
df_2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 227 entries, 0 to 234
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   sex         227 non-null    object
 1   anxlevel    227 non-null    object
 2   stress      227 non-null    object
 3   totalworry  227 non-null    object
 4   construct   227 non-null    object
dtypes: object(5)
memory usage: 10.6+ KB


In [9]:
# Again checking unique values for the anxlevel variable. There are now only three, which is correct.  
df_2['anxlevel'].unique()

array(['2', '3', '1'], dtype=object)

In [10]:
# Sex also now contains the correct number of values. 
df_2['sex'].unique()

array(['2', '1'], dtype=object)

In [11]:
# Using map to append a new variable showing the labels for the anxlevel values. 

df_2['anx_label'] = df_2['anxlevel'].map({'1': 'low', '2': 'medium', '3': 'high'})

df_2.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_2['anx_label'] = df_2['anxlevel'].map({'1': 'low', '2': 'medium', '3': 'high'})


Unnamed: 0,sex,anxlevel,stress,totalworry,construct,anx_label
0,2,2,1.3,3.0375,3.04878048780488,medium
1,2,2,2.1,3.21875,2.95121951219512,medium
2,1,3,1.95,2.025,3.29268292682927,high
3,2,2,2.1,1.80625,2.19512195121951,medium
4,2,2,2.05,2.5625,2.80487804878049,medium
5,2,1,0.95,1.38125,4.21951219512195,low
6,1,1,1.25,0.9125,,low
7,2,1,1.9,1.575,,low
8,2,2,1.05,1.04375,4.31707317073171,medium
9,2,3,1.55,1.89375,3.26829268292683,high


In [12]:
# Likewise using map to append a new variable with the meaningful labels for sex. 
df_2['sex_label'] = df_2['sex'].map({'1': 'male', '2': 'female'})

df_2.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_2['sex_label'] = df_2['sex'].map({'1': 'male', '2': 'female'})


Unnamed: 0,sex,anxlevel,stress,totalworry,construct,anx_label,sex_label
0,2,2,1.3,3.0375,3.04878048780488,medium,female
1,2,2,2.1,3.21875,2.95121951219512,medium,female
2,1,3,1.95,2.025,3.29268292682927,high,male
3,2,2,2.1,1.80625,2.19512195121951,medium,female
4,2,2,2.05,2.5625,2.80487804878049,medium,female
5,2,1,0.95,1.38125,4.21951219512195,low,female
6,1,1,1.25,0.9125,,low,male
7,2,1,1.9,1.575,,low,female
8,2,2,1.05,1.04375,4.31707317073171,medium,female
9,2,3,1.55,1.89375,3.26829268292683,high,female


In [13]:
df_2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 227 entries, 0 to 234
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   sex         227 non-null    object
 1   anxlevel    227 non-null    object
 2   stress      227 non-null    object
 3   totalworry  227 non-null    object
 4   construct   227 non-null    object
 5   anx_label   227 non-null    object
 6   sex_label   227 non-null    object
dtypes: object(7)
memory usage: 14.2+ KB
