Thesis project: Star Detection on live twitter data

Description

This is a flink based project that utilizes kafka to simulate a real incoming stream of twitter data into flink. Data is parsed and processed and then fed into a pipeline of logical subdivisions that flink calls windows. The result of the pipeline is the users that contribute the most to the biggest star topologies present in the network graph. This is achieved by calculating the out degree centrality of each user. Then the users that have a higher than 1000 out degree pass through a filter and persistent stars can be detected across the period of a month.

How to run

In order for the data to be fed into flink I use a bash script that reads a months worth of data and writes it to the kafka stream. Path variables need to be changed to run locally on another machine. I have marked them with @pathChange.

Install kafka locally from https://kafka.apache.org/downloads. In the makings of this project kafka 2.8.0 was used.
On a new terminal run bin/zookeeper-server-start.sh config/zookeeper.properties
On another terminal run bin/kafka-server-start.sh config/server.properties
Create a topic named quickstart-events by running bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092 on another terminal
Step 4 has to be performed only once.
Run the maven project StreamEnv_lessStrictStarDetectionTest.java in InteliJ
On a terminal run the script resources/startStream.sh

How to produce jar

A maven assembly plug-in included in this project can produce a jar file by running the install command in the maven lifecycle.

Additional Work done

The project also contains an algorithm that works on static graphs, utilizing the dataset library and simple execution environment, and detects strict star topologies. It is located in starDetectionTest.java. In contrast, the StreamEnv_lessStrictStarDetectionTest.java uses a stream environment and utilizes streaming solutions. Also in the main class of this project there is a flatMap implementation called ExistingStar that performs the static persistent user star search.The parsing script used is also inclused in the resources/parseAllToCSV.py file.

Troubleshooting

If you are having issues related to kafka logs you can delete them without breaking anything by running rm -rf /tmp/kafka-logs
If you are having issues related to the server in which kafka is listening then try changing the listeners in the server.properties config file to localhost

License

Distributed under the MIT License. See LICENSE for more information

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
src/main		src/main
target		target
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thesis project: Star Detection on live twitter data

Description

How to run

How to produce jar

Additional Work done

Troubleshooting

License

About

Releases

Packages

Languages

StefPler/StarDetection

Folders and files

Latest commit

History

Repository files navigation

Thesis project: Star Detection on live twitter data

Description

How to run

How to produce jar

Additional Work done

Troubleshooting

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages