#About
This notebook shows how to use data popularity api service wrapper. The service runs in docker container.

For further understanding of this service look **howto_01_datapop.ipynb** and **howto_02_datapopserv.ipynb** first.

#Docker pull & run
To start your work you should run docker container, which provides the data popularity api service

In terminal type:
1. ####sudo docker pull hushchynmikhail/datapopserv
2. ####sudo docker run -d -p 5000:5000 hushchynmikhail/datapopserv python DataPopularity/datapopserv/datapopserv/app.py

#Init DataPopularityApiWrapper

In [1]:
from datapopclient import DataPopularityClient

dpaw = DataPopularityClient(service_url='http://localhost:5000')

#Upload data

In [2]:
data_path = 'Data/popularity-728days.csv'
dpaw.upload(data_path=data_path)

1

#Run algorithm
The following method runs DataPopularityEstimator and DataIntensityPredictor

In [3]:
%%time
dpaw.run_algorithm(nb_of_weeks=104)

CPU times: user 6.78 ms, sys: 7.77 ms, total: 14.6 ms
Wall time: 4min 32s


#Get data popularity

In [4]:
%%time
popularity = dpaw.get_data_popularity()

CPU times: user 38.4 ms, sys: 243 µs, total: 38.6 ms
Wall time: 41 ms


In [5]:
popularity.irow(range(0,5))

Unnamed: 0.1,Unnamed: 0,Name,Popularity,Label
0,0,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.000157,0
1,1,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.370487,0
2,2,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.427786,1
3,3,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.000157,0
4,4,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.356515,1


#Get predicted data intensity

In [6]:
%%time
prediction = dpaw.get_data_intensity_prediction()

CPU times: user 35 ms, sys: 0 ns, total: 35 ms
Wall time: 36 ms


In [7]:
prediction.irow(range(0,5))

Unnamed: 0.1,Unnamed: 0,Name,Intensity,Std_error
0,0,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,6.24164,29.241197
1,1,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,32.69593,105.950344
2,2,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.0,2e-06
3,3,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.180068,0.0
4,4,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,2.080052e-15,0.0


#Get optimization report
This method returns report after loss function optimization. The method runs DataPlacementOptimizer

In [8]:
%%time
opti_report = dpaw.get_opti_report(q=None, set_replicas='auto', c_disk=100, c_tape=1, c_miss=2000,\
                alpha=1, min_replicas=1, max_replicas=4)

CPU times: user 31.3 ms, sys: 3.84 ms, total: 35.2 ms
Wall time: 2.51 s


In [9]:
opti_report.irow(range(0,5))

Unnamed: 0.1,Unnamed: 0,Name,OnDisk,NbReplicas
0,0,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,1,3
1,1,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0,1
2,2,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0,1
3,7338,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0,1
4,3,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,1,1


#Get opti_performance report

The method runs method Performance

In [16]:
%%time
opti_performance = dpaw.get_opti_performance(q=None, set_replicas='auto', c_disk=100, c_tape=1, c_miss=3000,\
                alpha=1, min_replicas=1, max_replicas=4)

CPU times: user 9.72 ms, sys: 2.78 ms, total: 12.5 ms
Wall time: 2.51 s


In [17]:
opti_performance

Unnamed: 0.1,Unnamed: 0,Downloading_time_ratio (train),Saving_space_(%) (train),Nb_of_mistakes (train)
0,0,0.956397,19.32928,8


#Get report
This method returns report for the data popularity value.

In [12]:
%%time
report = dpaw.get_report(q=None, set_replicas='auto', c_disk=100, c_tape=1, c_miss=2000,\
                alpha=1, min_replicas=1, max_replicas=4, pop_cut=0.5)

CPU times: user 43.1 ms, sys: 3.78 ms, total: 46.9 ms
Wall time: 864 ms


In [13]:
report.irow(range(0,5))

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Name,Popularity,Label,Intensity,LFNSize,OnDisk,NbReplicas,Missing
0,0,0,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.000157,0,6.24164,0.3179,1,3,0
1,1,1,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.370487,0,32.695929,0.649204,1,4,0
2,2,2,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.427786,1,0.0,1.370105,1,1,0
3,7338,7338,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.779592,1,0.0,0.09529,0,1,0
4,3,3,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.000157,0,0.180068,0.803981,1,1,0


#Get performance report

The method runs method Performance

In [22]:
%%time
performance = dpaw.get_opti_performance(q=None, set_replicas='auto', c_disk=100, c_tape=1, c_miss=3000,\
                alpha=0.01, min_replicas=1, max_replicas=7)

CPU times: user 11.2 ms, sys: 0 ns, total: 11.2 ms
Wall time: 2.59 s


In [23]:
performance

Unnamed: 0.1,Unnamed: 0,Downloading_time_ratio (train),Saving_space_(%) (train),Nb_of_mistakes (train)
0,0,0.706348,35.558147,8
