Skip to content

Commit 4f3c920

Browse files
committed
Excercise - 3
1 parent 495d816 commit 4f3c920

File tree

1 file changed

+251
-0
lines changed

1 file changed

+251
-0
lines changed
Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"### 1.Import the pandas library and load the dataset into the pandas data frame:"
8+
]
9+
},
10+
{
11+
"cell_type": "code",
12+
"execution_count": 2,
13+
"metadata": {},
14+
"outputs": [],
15+
"source": [
16+
"import pandas as pd\n",
17+
"\n",
18+
"#reading the data into the dataframe into the object data\n",
19+
"df = pd.read_csv('../Data/Banking_Marketing.csv', header=0)"
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"metadata": {},
25+
"source": [
26+
"### 2.Print how many missing values on each column. To do so, use isna() function from pandas dataframe"
27+
]
28+
},
29+
{
30+
"cell_type": "code",
31+
"execution_count": 3,
32+
"metadata": {},
33+
"outputs": [
34+
{
35+
"data": {
36+
"text/plain": [
37+
"age 2\n",
38+
"job 0\n",
39+
"marital 0\n",
40+
"education 0\n",
41+
"default 0\n",
42+
"housing 0\n",
43+
"loan 0\n",
44+
"contact 6\n",
45+
"month 0\n",
46+
"day_of_week 0\n",
47+
"duration 7\n",
48+
"campaign 0\n",
49+
"pdays 0\n",
50+
"previous 0\n",
51+
"poutcome 0\n",
52+
"emp_var_rate 0\n",
53+
"cons_price_idx 0\n",
54+
"cons_conf_idx 0\n",
55+
"euribor3m 0\n",
56+
"nr_employed 0\n",
57+
"y 0\n",
58+
"dtype: int64"
59+
]
60+
},
61+
"execution_count": 3,
62+
"metadata": {},
63+
"output_type": "execute_result"
64+
}
65+
],
66+
"source": [
67+
"df.isna().sum()"
68+
]
69+
},
70+
{
71+
"cell_type": "markdown",
72+
"metadata": {},
73+
"source": [
74+
"### 3.Impute the numerical data of the age column with its mean. To do so, first, find the mean of age column using the mean() function of pandas data frame and impute the missing data with its mean using fillna() function"
75+
]
76+
},
77+
{
78+
"cell_type": "code",
79+
"execution_count": 5,
80+
"metadata": {},
81+
"outputs": [
82+
{
83+
"data": {
84+
"text/plain": [
85+
"40.023812413525256"
86+
]
87+
},
88+
"execution_count": 5,
89+
"metadata": {},
90+
"output_type": "execute_result"
91+
}
92+
],
93+
"source": [
94+
"mean_age = df.age.mean()\n",
95+
"mean_age"
96+
]
97+
},
98+
{
99+
"cell_type": "code",
100+
"execution_count": 6,
101+
"metadata": {},
102+
"outputs": [],
103+
"source": [
104+
"df.age.fillna(mean_age,inplace=True)"
105+
]
106+
},
107+
{
108+
"cell_type": "markdown",
109+
"metadata": {},
110+
"source": [
111+
"### 4.Impute the numerical data of duration column with its median. To do so, first, find the median of duration column using the median() function of the pandas data frame and impute the missing data with its mean using fillna() function"
112+
]
113+
},
114+
{
115+
"cell_type": "code",
116+
"execution_count": 7,
117+
"metadata": {},
118+
"outputs": [
119+
{
120+
"data": {
121+
"text/plain": [
122+
"180.0"
123+
]
124+
},
125+
"execution_count": 7,
126+
"metadata": {},
127+
"output_type": "execute_result"
128+
}
129+
],
130+
"source": [
131+
"median_duration = df.duration.median()\n",
132+
"median_duration"
133+
]
134+
},
135+
{
136+
"cell_type": "code",
137+
"execution_count": 8,
138+
"metadata": {},
139+
"outputs": [],
140+
"source": [
141+
"df. duration.fillna(median_duration,inplace=True)"
142+
]
143+
},
144+
{
145+
"cell_type": "markdown",
146+
"metadata": {},
147+
"source": [
148+
"### 5.Impute the categorical data of the contact column with its mode. To do so, first, find the mode of contact column using mode() function of pandas data frame and impute the missing data with its mode using fillna() function"
149+
]
150+
},
151+
{
152+
"cell_type": "code",
153+
"execution_count": 9,
154+
"metadata": {},
155+
"outputs": [
156+
{
157+
"data": {
158+
"text/plain": [
159+
"'cellular'"
160+
]
161+
},
162+
"execution_count": 9,
163+
"metadata": {},
164+
"output_type": "execute_result"
165+
}
166+
],
167+
"source": [
168+
"mode_contact = df.contact.mode()[0]\n",
169+
"mode_contact"
170+
]
171+
},
172+
{
173+
"cell_type": "code",
174+
"execution_count": 10,
175+
"metadata": {},
176+
"outputs": [],
177+
"source": [
178+
"df.contact.fillna(mode_contact,inplace=True)"
179+
]
180+
},
181+
{
182+
"cell_type": "markdown",
183+
"metadata": {},
184+
"source": [
185+
"### 6.Print how many missing values on each column. To do so, use isna() function from pandas dataframe"
186+
]
187+
},
188+
{
189+
"cell_type": "code",
190+
"execution_count": 12,
191+
"metadata": {},
192+
"outputs": [
193+
{
194+
"data": {
195+
"text/plain": [
196+
"age 0\n",
197+
"job 0\n",
198+
"marital 0\n",
199+
"education 0\n",
200+
"default 0\n",
201+
"housing 0\n",
202+
"loan 0\n",
203+
"contact 0\n",
204+
"month 0\n",
205+
"day_of_week 0\n",
206+
"duration 0\n",
207+
"campaign 0\n",
208+
"pdays 0\n",
209+
"previous 0\n",
210+
"poutcome 0\n",
211+
"emp_var_rate 0\n",
212+
"cons_price_idx 0\n",
213+
"cons_conf_idx 0\n",
214+
"euribor3m 0\n",
215+
"nr_employed 0\n",
216+
"y 0\n",
217+
"dtype: int64"
218+
]
219+
},
220+
"execution_count": 12,
221+
"metadata": {},
222+
"output_type": "execute_result"
223+
}
224+
],
225+
"source": [
226+
"df.isna().sum()"
227+
]
228+
}
229+
],
230+
"metadata": {
231+
"kernelspec": {
232+
"display_name": "Python 3",
233+
"language": "python",
234+
"name": "python3"
235+
},
236+
"language_info": {
237+
"codemirror_mode": {
238+
"name": "ipython",
239+
"version": 3
240+
},
241+
"file_extension": ".py",
242+
"mimetype": "text/x-python",
243+
"name": "python",
244+
"nbconvert_exporter": "python",
245+
"pygments_lexer": "ipython3",
246+
"version": "3.6.4"
247+
}
248+
},
249+
"nbformat": 4,
250+
"nbformat_minor": 2
251+
}

0 commit comments

Comments
 (0)