Skip to content

antoniod20/dvc-twitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

Reproducible pipeline from Twitter API using DVC

In this project I built a pipeline using DVC from my previously created notebook, called the Twitter API. Due to the size of my notebook, I only put the most important parts of my work into the pipeline. This parts are:

  • Creation of dataset (I reduced the size of the dataset due to time reasons)
  • Creation of a NetworkX graph
  • Generation of the image of the graph

The pipeline graph is the following:

  +-------+
  | fetch |
  +-------+
       *
       *
       *
  +-------+
  | graph |
  +-------+
       *
       *
       *
+------------+
| egonetwork |
+------------+

Setup

Download

To download the project, proceed with cloning.

git clone https://github.com/antoniod20/dvc-twitter.git

Configuration

The project was carried out with Python 3.6.9. It is therefore advisable to have a version of Python at least higher than version 3 installed. To install all the libraries needed to run the project, it is necessary to run this command line:

pip install -r src/requirements.txt

Run

To launch the pipeline, the following steps must be run:

  • First command
cd .\dvc-twitter\dvc-twitter-api\
  • Second command
dvc repro

Resources & Libraries

  • Tweepy - Twitter API
  • NetworkX - Useful to handle the study of graphs and networks
  • Pandas - Useful to handle the CSV file
  • Matplotlib - Provides functions for embedding plots into applications

Author

About

Reproducible pipeline from Twitter API using DVC

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages