#About
This notebook shows how to use data popularity api service wrapper. The service runs in docker container.

For further understanding of this service look http://nbviewer.ipython.org/github/hushchyn-mikhail/DataPopularity/blob/master/howto/howto_01_DataPopularity.ipynb and http://nbviewer.ipython.org/github/hushchyn-mikhail/DataPopularity/blob/master/howto/howto_02_data_popularity_api.ipynb first.

#Docker pull & run
To start your work you should run docker container, which provides the data popularity api service

In terminal type:
1. ####sudo docker pull hushchynmikhail/dp_api
2. ####sudo docker run -d -p 5000:5000 hushchynmikhail/dp_api python DataPopularity/data_popularity_api/dp_api.py

#Init DataPopularityApiWrapper

In [16]:
from DataPopularityApiWrapper import DataPopularityApiWrapper

dpaw = DataPopularityApiWrapper(service_url='http://localhost:5000')

#Upload data

In [17]:
data_path = 'Data/popularity-728days.csv'
dpaw.upload(data_path=data_path)

1

#Run algorithm
The following method runs DataPopularityEstimator and DataIntensityPredictor

In [18]:
%%time
dpaw.run_algorithm(nb_of_weeks=104)

CPU times: user 8.54 ms, sys: 9.57 ms, total: 18.1 ms
Wall time: 7min 11s


#Get data popularity

In [19]:
%%time
popularity = dpaw.get_data_popularity()

CPU times: user 28.2 ms, sys: 7.36 ms, total: 35.5 ms
Wall time: 40.1 ms


In [20]:
popularity.irow(range(0,5))

Unnamed: 0.1,Unnamed: 0,Name,Popularity,Label
0,0,/MC/2010/Beam3500GeV-Oct2010-MagDown-Nu2.5/Sim...,0.726217,1
1,1,/MC/2011/Beam3500GeV-2011-MagDown-Fix1-EmNoCut...,0.66405,1
2,2,/MC/2012/Beam4000GeV-2012-MagUp-Nu2.5-Pythia8/...,0.011929,0
3,3,/MC/2012/Beam4000GeV-2012-MagUp-Nu2.5-Pythia8/...,0.002393,0
4,4,/MC/2011/Beam3500GeV-2011-MagDown-Nu2-EmNoCuts...,0.801256,1


#Get predicted data intensity

In [21]:
%%time
prediction = dpaw.get_data_intensity_prediction()

CPU times: user 35.9 ms, sys: 0 ns, total: 35.9 ms
Wall time: 38.7 ms


In [22]:
prediction.irow(range(0,5))

Unnamed: 0.1,Unnamed: 0,Name,Intensity,Std_error
0,0,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,6.24164,29.241197
1,1,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,32.69593,105.950344
2,2,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.0,2e-06
3,3,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.180068,0.0
4,4,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,2.080052e-15,0.0


#Get optimization report
This method returns report after loss function optimization. The method runs DataPlacementOptimizer

In [23]:
%%time
opti_report = dpaw.get_opti_report(q=None, set_replicas='auto', c_disk=100, c_tape=1, c_miss=2000,\
                alpha=1, max_replicas=4)

CPU times: user 39.2 ms, sys: 0 ns, total: 39.2 ms
Wall time: 6.04 s


In [24]:
opti_report.irow(range(0,5))

Unnamed: 0.1,Unnamed: 0,Name,OnDisk,NbReplicas
0,670,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,1,3
1,3152,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,1,4
2,7992,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0,1
3,5847,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0,1
4,10331,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,1,1


#Get report
This method returns report for the data popularity value.

In [25]:
%%time
report = dpaw.get_report(q=None, set_replicas='auto', c_disk=100, c_tape=1, c_miss=2000,\
                alpha=1, max_replicas=4, pop_cut=0.5)

CPU times: user 49 ms, sys: 189 µs, total: 49.2 ms
Wall time: 649 ms


In [26]:
report.irow(range(0,5))

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Name,Popularity,Label,Intensity,LFNSize,OnDisk,NbReplicas,Missing
0,670,670,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.019055,0,6.24164,0.3179,1,3,0
1,3152,3152,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.036589,0,32.695929,0.649204,1,4,0
2,7992,2808,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.351962,1,0.0,1.370105,1,1,0
3,5847,663,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.405495,1,0.0,0.09529,1,1,0
4,10331,5147,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.000638,0,0.180068,0.803981,1,1,0
