<a href="https://colab.research.google.com/github/drshahizan/Python_Tutorial/blob/main/big%20data/modin/lab_6_IntelModin_Vs_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 6: Intel® Modin Vs Pandas Performance Sample
The Intel® Modin Vs Pandas Performance code illustrates how to use Modin* to replace the Pandas API. The sample compares the performance of Intel® Distribution of Modin* and the performance of Pandas for specific dataframe operations.

# Purpose
Intel® Distribution of Modin* accelerates Pandas operations using Ray or Dask execution engine. The distribution provides compatibility and integration with the existing Pandas code. The sample code demonstrates how to perform some basic dataframe operations using Pandas and Intel® Distribution of Modin. You will be able to compare the performance difference between the two methods.

# Key Implementation Details
This code sample is implemented for CPU using Python programming language. The sample requires NumPy, Pandas, Modin libraries, and the time module in Python.

## Install Modin

In [None]:
#Install modin
!pip install modin[all]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting modin[all]
  Downloading modin-0.18.0-py3-none-any.whl (970 kB)
[K     |████████████████████████████████| 970 kB 29.3 MB/s 
Collecting pandas==1.5.2
  Downloading pandas-1.5.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.2 MB)
[K     |████████████████████████████████| 12.2 MB 69.5 MB/s 
Collecting ray[default]>=1.13.0
  Downloading ray-2.2.0-cp38-cp38-manylinux2014_x86_64.whl (57.4 MB)
[K     |████████████████████████████████| 57.4 MB 1.2 MB/s 
Collecting modin-spreadsheet>=0.1.0
  Downloading modin_spreadsheet-0.1.2-py2.py3-none-any.whl (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 60.7 MB/s 
[?25hCollecting unidist[mpi]>=0.2.1
  Downloading unidist-0.2.1-py3-none-any.whl (102 kB)
[K     |████████████████████████████████| 102 kB 71.1 MB/s 
[?25hCollecting boto3
  Downloading boto3-1.26.34-py3-none-any.whl (132 kB)
[K     |███████████████████

## Import required libraries

In [None]:
import numpy as np
import time
import pandas as pd
import modin.pandas as md

## Create an array of random integers

In [None]:
#Create an array of random integers
arr =np.random.randint(low=10,high=1000,size=(2**15,2**10))
#Save it as a csv file
np.savetxt("data.csv", arr, delimiter=",") 

##Read csv file

In [None]:
#Read data.csv file using Pandas
%time p_df = pd.read_csv("data.csv")

CPU times: user 5.98 s, sys: 525 ms, total: 6.5 s
Wall time: 6.52 s


In [None]:
#Read data.csv file using Modin
%time m_df = md.read_csv("data.csv")


    import ray
    ray.init(runtime_env={'env_vars': {'__MODIN_AUTOIMPORT_PANDAS__': '1'}})

2022-12-21 07:52:16,502	INFO worker.py:1529 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m


CPU times: user 1.73 s, sys: 647 ms, total: 2.37 s
Wall time: 14.7 s


##Mean operation

In [None]:
#Compute mean of each numerical column using Pandas
%time p_df.mean(axis=0)

CPU times: user 83.4 ms, sys: 4.02 ms, total: 87.5 ms
Wall time: 89.9 ms


5.280000000000000000e+02      505.833369
6.150000000000000000e+02      508.127812
4.570000000000000000e+02      502.553331
1.670000000000000000e+02      504.945006
5.300000000000000000e+01      503.792993
                                 ...    
3.920000000000000000e+02      504.839228
3.150000000000000000e+02.2    507.107303
9.300000000000000000e+02.2    501.581469
6.560000000000000000e+02.2    502.876644
2.800000000000000000e+02      506.612049
Length: 1024, dtype: float64

In [None]:
#Compute mean of each numerical column using Modin
%time m_df.mean(axis=0)

CPU times: user 23.3 ms, sys: 2.02 ms, total: 25.3 ms
Wall time: 29.8 ms


5.280000000000000000e+02      505.833369
6.150000000000000000e+02      508.127812
4.570000000000000000e+02      502.553331
1.670000000000000000e+02      504.945006
5.300000000000000000e+01      503.792993
                                 ...    
3.920000000000000000e+02      504.839228
3.150000000000000000e+02.2    507.107303
9.300000000000000000e+02.2    501.581469
6.560000000000000000e+02.2    502.876644
2.800000000000000000e+02      506.612049
Length: 1024, dtype: float64

##Concatenation

In [None]:
#Concatenate the DataFrame with itself using Pandas
%time pd.concat([p_df, p_df], axis=0)

CPU times: user 152 ms, sys: 33.2 ms, total: 185 ms
Wall time: 193 ms


Unnamed: 0,5.280000000000000000e+02,6.150000000000000000e+02,4.570000000000000000e+02,1.670000000000000000e+02,5.300000000000000000e+01,1.490000000000000000e+02,3.300000000000000000e+02,9.720000000000000000e+02,3.970000000000000000e+02,2.800000000000000000e+01,...,9.710000000000000000e+02.1,4.000000000000000000e+01.1,4.680000000000000000e+02,2.620000000000000000e+02,2.890000000000000000e+02.1,3.920000000000000000e+02,3.150000000000000000e+02.2,9.300000000000000000e+02.2,6.560000000000000000e+02.2,2.800000000000000000e+02
0,371.0,703.0,536.0,602.0,913.0,337.0,476.0,309.0,995.0,173.0,...,529.0,976.0,923.0,847.0,944.0,597.0,252.0,463.0,107.0,679.0
1,931.0,62.0,435.0,746.0,387.0,249.0,652.0,872.0,960.0,737.0,...,95.0,303.0,748.0,313.0,741.0,116.0,427.0,634.0,558.0,309.0
2,197.0,575.0,18.0,935.0,821.0,147.0,478.0,343.0,443.0,190.0,...,120.0,578.0,147.0,185.0,683.0,609.0,489.0,418.0,946.0,899.0
3,149.0,280.0,565.0,689.0,665.0,829.0,198.0,790.0,960.0,535.0,...,522.0,250.0,404.0,634.0,722.0,439.0,526.0,990.0,621.0,242.0
4,946.0,304.0,382.0,275.0,543.0,657.0,480.0,274.0,531.0,521.0,...,675.0,863.0,91.0,253.0,24.0,305.0,756.0,869.0,549.0,933.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32762,249.0,685.0,793.0,558.0,658.0,510.0,797.0,339.0,267.0,133.0,...,670.0,291.0,446.0,234.0,112.0,704.0,905.0,47.0,376.0,644.0
32763,436.0,126.0,844.0,121.0,105.0,399.0,28.0,171.0,337.0,57.0,...,809.0,642.0,758.0,261.0,454.0,987.0,730.0,726.0,928.0,644.0
32764,416.0,580.0,213.0,993.0,923.0,825.0,143.0,988.0,75.0,918.0,...,711.0,919.0,619.0,422.0,731.0,415.0,657.0,179.0,713.0,204.0
32765,874.0,471.0,59.0,700.0,304.0,191.0,499.0,85.0,98.0,485.0,...,794.0,561.0,355.0,591.0,109.0,90.0,955.0,493.0,286.0,456.0


In [None]:
#Concatenate the DataFrame with itself using Modin
%time md.concat([m_df, m_df], axis=0)

CPU times: user 16.2 ms, sys: 926 µs, total: 17.1 ms
Wall time: 18.1 ms


Unnamed: 0,5.280000000000000000e+02,6.150000000000000000e+02,4.570000000000000000e+02,1.670000000000000000e+02,5.300000000000000000e+01,1.490000000000000000e+02,3.300000000000000000e+02,9.720000000000000000e+02,3.970000000000000000e+02,2.800000000000000000e+01,...,9.710000000000000000e+02.1,4.000000000000000000e+01.1,4.680000000000000000e+02,2.620000000000000000e+02,2.890000000000000000e+02.1,3.920000000000000000e+02,3.150000000000000000e+02.2,9.300000000000000000e+02.2,6.560000000000000000e+02.2,2.800000000000000000e+02
0,371.0,703.0,536.0,602.0,913.0,337.0,476.0,309.0,995.0,173.0,...,529.0,976.0,923.0,847.0,944.0,597.0,252.0,463.0,107.0,679.0
1,931.0,62.0,435.0,746.0,387.0,249.0,652.0,872.0,960.0,737.0,...,95.0,303.0,748.0,313.0,741.0,116.0,427.0,634.0,558.0,309.0
2,197.0,575.0,18.0,935.0,821.0,147.0,478.0,343.0,443.0,190.0,...,120.0,578.0,147.0,185.0,683.0,609.0,489.0,418.0,946.0,899.0
3,149.0,280.0,565.0,689.0,665.0,829.0,198.0,790.0,960.0,535.0,...,522.0,250.0,404.0,634.0,722.0,439.0,526.0,990.0,621.0,242.0
4,946.0,304.0,382.0,275.0,543.0,657.0,480.0,274.0,531.0,521.0,...,675.0,863.0,91.0,253.0,24.0,305.0,756.0,869.0,549.0,933.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32762,249.0,685.0,793.0,558.0,658.0,510.0,797.0,339.0,267.0,133.0,...,670.0,291.0,446.0,234.0,112.0,704.0,905.0,47.0,376.0,644.0
32763,436.0,126.0,844.0,121.0,105.0,399.0,28.0,171.0,337.0,57.0,...,809.0,642.0,758.0,261.0,454.0,987.0,730.0,726.0,928.0,644.0
32764,416.0,580.0,213.0,993.0,923.0,825.0,143.0,988.0,75.0,918.0,...,711.0,919.0,619.0,422.0,731.0,415.0,657.0,179.0,713.0,204.0
32765,874.0,471.0,59.0,700.0,304.0,191.0,499.0,85.0,98.0,485.0,...,794.0,561.0,355.0,591.0,109.0,90.0,955.0,493.0,286.0,456.0


##applymap() method

In [None]:
#Element-wise multiplication of each element by 2 using Pandas
%time p_df.applymap(lambda i:i*2)

CPU times: user 8.05 s, sys: 192 ms, total: 8.24 s
Wall time: 8.51 s


Unnamed: 0,5.280000000000000000e+02,6.150000000000000000e+02,4.570000000000000000e+02,1.670000000000000000e+02,5.300000000000000000e+01,1.490000000000000000e+02,3.300000000000000000e+02,9.720000000000000000e+02,3.970000000000000000e+02,2.800000000000000000e+01,...,9.710000000000000000e+02.1,4.000000000000000000e+01.1,4.680000000000000000e+02,2.620000000000000000e+02,2.890000000000000000e+02.1,3.920000000000000000e+02,3.150000000000000000e+02.2,9.300000000000000000e+02.2,6.560000000000000000e+02.2,2.800000000000000000e+02
0,742.0,1406.0,1072.0,1204.0,1826.0,674.0,952.0,618.0,1990.0,346.0,...,1058.0,1952.0,1846.0,1694.0,1888.0,1194.0,504.0,926.0,214.0,1358.0
1,1862.0,124.0,870.0,1492.0,774.0,498.0,1304.0,1744.0,1920.0,1474.0,...,190.0,606.0,1496.0,626.0,1482.0,232.0,854.0,1268.0,1116.0,618.0
2,394.0,1150.0,36.0,1870.0,1642.0,294.0,956.0,686.0,886.0,380.0,...,240.0,1156.0,294.0,370.0,1366.0,1218.0,978.0,836.0,1892.0,1798.0
3,298.0,560.0,1130.0,1378.0,1330.0,1658.0,396.0,1580.0,1920.0,1070.0,...,1044.0,500.0,808.0,1268.0,1444.0,878.0,1052.0,1980.0,1242.0,484.0
4,1892.0,608.0,764.0,550.0,1086.0,1314.0,960.0,548.0,1062.0,1042.0,...,1350.0,1726.0,182.0,506.0,48.0,610.0,1512.0,1738.0,1098.0,1866.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32762,498.0,1370.0,1586.0,1116.0,1316.0,1020.0,1594.0,678.0,534.0,266.0,...,1340.0,582.0,892.0,468.0,224.0,1408.0,1810.0,94.0,752.0,1288.0
32763,872.0,252.0,1688.0,242.0,210.0,798.0,56.0,342.0,674.0,114.0,...,1618.0,1284.0,1516.0,522.0,908.0,1974.0,1460.0,1452.0,1856.0,1288.0
32764,832.0,1160.0,426.0,1986.0,1846.0,1650.0,286.0,1976.0,150.0,1836.0,...,1422.0,1838.0,1238.0,844.0,1462.0,830.0,1314.0,358.0,1426.0,408.0
32765,1748.0,942.0,118.0,1400.0,608.0,382.0,998.0,170.0,196.0,970.0,...,1588.0,1122.0,710.0,1182.0,218.0,180.0,1910.0,986.0,572.0,912.0


In [None]:
#Element-wise multiplication of each element by 2 using Pandas
%time m_df.applymap(lambda i:i*2)

CPU times: user 4.29 ms, sys: 1.11 ms, total: 5.39 ms
Wall time: 4.93 ms


Unnamed: 0,5.280000000000000000e+02,6.150000000000000000e+02,4.570000000000000000e+02,1.670000000000000000e+02,5.300000000000000000e+01,1.490000000000000000e+02,3.300000000000000000e+02,9.720000000000000000e+02,3.970000000000000000e+02,2.800000000000000000e+01,...,9.710000000000000000e+02.1,4.000000000000000000e+01.1,4.680000000000000000e+02,2.620000000000000000e+02,2.890000000000000000e+02.1,3.920000000000000000e+02,3.150000000000000000e+02.2,9.300000000000000000e+02.2,6.560000000000000000e+02.2,2.800000000000000000e+02
0,742.0,1406.0,1072.0,1204.0,1826.0,674.0,952.0,618.0,1990.0,346.0,...,1058.0,1952.0,1846.0,1694.0,1888.0,1194.0,504.0,926.0,214.0,1358.0
1,1862.0,124.0,870.0,1492.0,774.0,498.0,1304.0,1744.0,1920.0,1474.0,...,190.0,606.0,1496.0,626.0,1482.0,232.0,854.0,1268.0,1116.0,618.0
2,394.0,1150.0,36.0,1870.0,1642.0,294.0,956.0,686.0,886.0,380.0,...,240.0,1156.0,294.0,370.0,1366.0,1218.0,978.0,836.0,1892.0,1798.0
3,298.0,560.0,1130.0,1378.0,1330.0,1658.0,396.0,1580.0,1920.0,1070.0,...,1044.0,500.0,808.0,1268.0,1444.0,878.0,1052.0,1980.0,1242.0,484.0
4,1892.0,608.0,764.0,550.0,1086.0,1314.0,960.0,548.0,1062.0,1042.0,...,1350.0,1726.0,182.0,506.0,48.0,610.0,1512.0,1738.0,1098.0,1866.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32762,498.0,1370.0,1586.0,1116.0,1316.0,1020.0,1594.0,678.0,534.0,266.0,...,1340.0,582.0,892.0,468.0,224.0,1408.0,1810.0,94.0,752.0,1288.0
32763,872.0,252.0,1688.0,242.0,210.0,798.0,56.0,342.0,674.0,114.0,...,1618.0,1284.0,1516.0,522.0,908.0,1974.0,1460.0,1452.0,1856.0,1288.0
32764,832.0,1160.0,426.0,1986.0,1846.0,1650.0,286.0,1976.0,150.0,1836.0,...,1422.0,1838.0,1238.0,844.0,1462.0,830.0,1314.0,358.0,1426.0,408.0
32765,1748.0,942.0,118.0,1400.0,608.0,382.0,998.0,170.0,196.0,970.0,...,1588.0,1122.0,710.0,1182.0,218.0,180.0,1910.0,986.0,572.0,912.0
