More efficient alternative in 04_ApplyStudents_Alcohol_Consumption #78

rahimnathwani · 2019-03-06T02:55:51Z

In step 10, we want to multiply all numerical values by 10.

The provided solution is:
df.applymap(times10).head(10)

But this is very slow, because it runs a regular python function on every element in the dataframe.

Better is to test each column's type, and then use pandas built in multiplication on the whole column:

for colname, coltype in df.dtypes.to_dict().items():
    if coltype.name in ['int64']:
        df[colname] = df[colname] * 10

I used %%timeit to test the two solutions. On this small dataset, my solution is 5x as fast (1.1ms vs 5.8ms). The difference would get larger with a larger dataset.

The text was updated successfully, but these errors were encountered:

pcarlitz · 2019-06-11T03:15:24Z

what if it's not an int64 though? This might work better.

newdf = df.select_dtypes(include=[np.number])
for column in newdf.columns:
    newdf[column] = newdf[column] * 10

guipsamora · 2019-10-13T14:11:11Z

@pcarlitz have you measured the performance? I am in favor of the fastest solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More efficient alternative in 04_ApplyStudents_Alcohol_Consumption #78

More efficient alternative in 04_ApplyStudents_Alcohol_Consumption #78

rahimnathwani commented Mar 6, 2019

pcarlitz commented Jun 11, 2019 •

edited

guipsamora commented Oct 13, 2019

More efficient alternative in 04_ApplyStudents_Alcohol_Consumption #78

More efficient alternative in 04_ApplyStudents_Alcohol_Consumption #78

Comments

rahimnathwani commented Mar 6, 2019

pcarlitz commented Jun 11, 2019 • edited

guipsamora commented Oct 13, 2019

pcarlitz commented Jun 11, 2019 •

edited