Excercise - 3

nursnaaz · nursnaaz · commit 4f3c920cf4b8 · 2019-06-20T22:28:13.000+05:30
diff --git a/Chapter 1/Excercises/Excercise_3_Impute_missing_data.ipynb b/Chapter 1/Excercises/Excercise_3_Impute_missing_data.ipynb
@@ -0,0 +1,251 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1.Import the pandas library and load the dataset into the pandas data frame:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "#reading the data into the dataframe into the object data\n",
+    "df = pd.read_csv('../Data/Banking_Marketing.csv', header=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.Print how many missing values on each column. To do so, use isna() function from pandas dataframe"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "age               2\n",
+       "job               0\n",
+       "marital           0\n",
+       "education         0\n",
+       "default           0\n",
+       "housing           0\n",
+       "loan              0\n",
+       "contact           6\n",
+       "month             0\n",
+       "day_of_week       0\n",
+       "duration          7\n",
+       "campaign          0\n",
+       "pdays             0\n",
+       "previous          0\n",
+       "poutcome          0\n",
+       "emp_var_rate      0\n",
+       "cons_price_idx    0\n",
+       "cons_conf_idx     0\n",
+       "euribor3m         0\n",
+       "nr_employed       0\n",
+       "y                 0\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df.isna().sum()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.Impute the numerical data of the age column with its mean. To do so, first, find the mean of age column using the mean() function of pandas data frame and impute the missing data with its mean using fillna() function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "40.023812413525256"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "mean_age = df.age.mean()\n",
+    "mean_age"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.age.fillna(mean_age,inplace=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4.Impute the numerical data of duration column with its median. To do so, first, find the median of duration column using the median() function of the pandas data frame and impute the missing data with its mean using fillna() function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "180.0"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "median_duration = df.duration.median()\n",
+    "median_duration"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df. duration.fillna(median_duration,inplace=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5.Impute the categorical data of the contact column with its mode. To do so, first, find the mode of contact column using mode() function of pandas data frame and impute the missing data with its mode using fillna() function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'cellular'"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "mode_contact = df.contact.mode()[0]\n",
+    "mode_contact"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.contact.fillna(mode_contact,inplace=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.Print how many missing values on each column. To do so, use isna() function from pandas dataframe"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "age               0\n",
+       "job               0\n",
+       "marital           0\n",
+       "education         0\n",
+       "default           0\n",
+       "housing           0\n",
+       "loan              0\n",
+       "contact           0\n",
+       "month             0\n",
+       "day_of_week       0\n",
+       "duration          0\n",
+       "campaign          0\n",
+       "pdays             0\n",
+       "previous          0\n",
+       "poutcome          0\n",
+       "emp_var_rate      0\n",
+       "cons_price_idx    0\n",
+       "cons_conf_idx     0\n",
+       "euribor3m         0\n",
+       "nr_employed       0\n",
+       "y                 0\n",
+       "dtype: int64"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df.isna().sum()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}