Skip to content

MiguelBarriosAl/logs-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Logs Pipeline

Introduction

Project consists of implementing a tool that is able to analyze log files containing online visits and collect a couple of metrics based on the aggregates in a Memory and CPU efficient way. The way to ensure good data processing performance depending on the size of the data is to offset the data in the file. Therefore it is separated into Chunks with a choice of size. The project consists of the following file packages:

  • pipelines/pipeline.py: Initial execution of data processing
  • pipelines/load.py: Loading data into chunks
  • pipelines/process.py: Data processing according to timestamp time unit
  • pipelines/tester.py: Invalid data analysis
  • pipelines/data: File where the .csv file is stored
  • tests/test_tester.py: Running of Unit Tests

Requirements

  • SO == Windows 10
  • Python == 3.9.7
  • Pip == 21.3.1

Installation and Run

  • Install python==3.9.7

      sudo apt update
    
      sudo apt install python3.9.7
    
      python --version
    
  • Clone the repository

      git clone https://github.com/MiguelBarriosAl/logs-pipeline.git
    
  • Run Unit Test

       python3 -m unittest .\tests\test_tester.py
    
  • Run Pipeline

       python3 .\pipelines\pipeline.py
    

A Simple Example

python3 .\pipelines\pipeline.py

{'1667233287': 'early_buyers', '1667236947': 'buyers_lead'}
{'1667233287': 'google', '1667236947': 'bbva'}
{'1667233287': 'social', '1667236947': 'social'}

Recommendations

In case of production the project is intended to introduce Multithreading in which it would consist of 4 threads. One thread for each method (campaign, source, medium), and another one for false user analysis.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages