Skip to content

Work for conducting analysis of the popular Alibaba trace dataset.

License

Notifications You must be signed in to change notification settings

ChamiLamelas/Tufts-DCC-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alibaba Trace Analysis

Hardware

CloudLab c220g5

Dataset

github

Instructions

  • Run build_aggregate_dependencies.py
    • This produces the called by, calling graphs, unique microservices, and numer of traces each microservice appears in for all the data files.
    • This takes around 30 minutes to run on CloudLab.
  • Then, run aggregate_dependency_analysis.py
    • This produces called by, calling distributions, summary statistics such as the sparsity ratio, connected component sizes, and more building off the output of above.
    • This takes around 4 minutes to run on CloudLab.
  • Then, run trace_contiguity_analysis.py
    • This produces the files on which each trace occurs as well as some information about the (lack of) contiguity of traces in the dataset.
    • This takes around 10 minutes to run on CloudLab.
  • Then, run trace_analysis.py
    • This produces statistics on errors in the trace files that were collected by above file.
    • This takes 15 minutes to run on CloudLab.
  • Then, run trace_plots.py
    • This produces plots using some of collected statistics from above.
    • This takes around 3 minutes to run on CloudLab.
  • Run get_nice_traces.py x to get x instances of nice traces (RPC IDs are unique). May have other issues.
  • Run sample_error_traces.py x to get x instances of traces with not unique RPC IDs, x traces missing 1 microservice ID, and x traces missing 2 microservice IDs.
  • Then, run even_more_trace_analysis.py
    • This produces statistics on errors in the trace files that were collected by above file.
    • This takes 15 minutes to run on CloudLab.

Graph Embedding & Eigenvectors

  • Run analyze_graphs.py to obtain eigenvectors of each root services, depending on the call graph structures rooted from that microservice. The results would be stored in a single json file name %{file_num}_pkl_fils_pca.json under results/embeddings/, where %{file_num} is the number of okl files processed in the program.
  • Then run pca_plots.py, which plots a series of images for each microservice, under results/embeddings, as well as one figure putting all eigenvectors of all root microservices altogether.

Results files:

  • Generally, .pkl files have some objects we construct in an expensive job (e.g. aggregate dependency graphs).
  • .png are plots.
  • .txt hold statistic results (check errors/ subdirectory for those specifically pertaining to oddities in the data).

Utilities files:

  • misc.py, collect_traces.py and build_call_graph.py are all modules with a variety of utilities used by other files.

About

Work for conducting analysis of the popular Alibaba trace dataset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published