SIRITVIS

Social Interaction Research Insights Topic Visualisation

📋 Summary

The SIRITVIS Python package helps you understand data from social media platforms like Instagram, Reddit or any other text data sources. It uses advanced techniques to find hidden patterns in large amounts of text data. SIRITVIS includes tools for gathering data, cleaning it, analyzing it, and visualizing the results. You can see where certain topics are being talked about on a map and how often they are mentioned.

SIRITVIS uses well-known methods from data science, machine learning, and mapping to ensure accurate results. It cleans the data thoroughly and uses reliable models to find meaningful topics. You can evaluate the quality of these topics using built-in tools. The package also includes visual tools to help you easily see the distribution of topics on a map.

A key feature of SIRITVIS is its ability to show where on a world map people are talking about different topics. It can categorize these places by the sentiment of the posts, such as positive, negative, or neutral. You can also search for specific keywords and see where they appear on the map.

SIRITVIS is helpful in various areas, like marketing, politics, and disaster response, by providing tools to analyze the spread of topics. It helps users understand their audience better and make informed decisions based on the analysis of social media data.

📝 How to cite

Narwade, S., Kant, G., Säfken, B., and Leiding, B. (2023), SIRITVIS: Social Interaction Research Insights Topic Visualisation. Journal of Open Source Software, https://joss.theoj.org/papers/b51be70e9634e45d8035ee20b6147d76.

Advisory

Ensure Python version '>=3.10, <3.11'.
Utilize IDEs like Visual Studio or platforms like Google Colab for enhanced plot visualization.
Refer to the provided sample dataset for better comprehension.

💡 Features

Data Streaming 💾
Data Cleaning 🧹
Topic Model Training and Evaluation 🎯
Topic Visual Insights 🔍
Trending Topic Geo Visualisation 🌏

🛠 Installation

Attention: SIRITVIS is specifically tailored for operation on Python 3.10, and its visualization capabilities are optimized for Python notebooks. Extensive testing has been conducted under these specifications. For the best compatibility and performance, we advise setting up a fresh (conda) environment utilizing Python 3.10.10.

The package can be installed via pip:

pip install SIRITVIS

👩‍💻 Usage ([documentation])

Import Libraries

from SIRITVIS import insta_streamer, reddit_streamer, cleaner, topic_model, topic_visualise, topic_mapper

Streaming Reddit Data

For authentication with the Reddit Streaming API, follow the steps outlined in this tutorial.

# Run the streaming process to retrieve raw data based on the specified keywords

client_id = "XXXXXXXXXX"
client_secret = "XXXXXXXXX"
user_agent = "XXXXXXXXXX"
keywords = ['Specific','Keywords'] # default is None # Use multiple keywords for a more varied dataset during streaming data collection.
save_path = '../folder/path/to/store/the/data/'
raw_data = reddit_streamer.RedditStreamer(client_id,client_secret,user_agent,save_path,keywords).run()

Streaming Instagram Data

For authentication with the Instagram Streaming API, sign up the page apify

# Run the streaming process to retrieve raw data based on the specified keywords

api_token = 'apify_api_XXXXXXXXX'
save_path = '../folder/path/to/store/the/data/'
instagram_username = 'XXXXXXXXX'
instagram_password = 'XXXXXXXXX'
hashtags = ['Specific','Keywords'] # default is ['instagram'] # Use multiple keywords for a more varied dataset during streaming data collection.
limit =  20 # number of post captions to extract. default is 100
raw_data  = insta_streamer.InstagramStreamer(api_token,save_path,instagram_username,instagram_password,hashtags,limit).run()

Clean Streamed Data or Any External Text Data

# raw_data variable might also be used as load_path attribute value
cleaner_obj = cleaner.Cleaner(data_source='../folder/path/or/csv/file/path/to/load/data/')
# cleaner_obj.clean_data     # get cleaned dataset without saving it
cleaned_file = cleaner_obj.saving('../folder/path/to/store/the/cleaned/data/',data_save_name='dataset_file_name')

Train your a topic model on corpus of short texts

Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)

# cleaned_file variable might also be used as dataset_source attribute value

model = topic_model.TopicModeling(num_topics=10, dataset_source='../csv/file/path/to/load/data.csv',
learning_rate=0.001, batch_size=32, activation='softplus', num_layers=3, num_neurons=100,
dropout=0.2, num_epochs=100, save_model=False, model_path=None, train_model='NeuralLDA',evaluation=['topicdiversity','invertedrbo','jaccardsimilarity'])

saved_model = model.run()

Topic Insights Visualisation

To investigate internal structure of topics and their relations to words and indicidual documents we recommend using pyLDAvis.
Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)

# cleaned_file variable could also used as data_source attribute value

vis_model = topic_visualise.PyLDAvis(data_source='../csv/file/path/to/load/data.csv',num_topics=5,text_column='text')
vis_model.visualize()

A graphical display of text data in which the importance of each word reflects its frequency or significance within the text.

Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)

# The cleaned_file variable might also be used as data_source attribute value
# please wait for a while for the word cloud to appear.

vis_model = topic_visualise.Wordcloud(data_source='../csv/file/path/to/load/data.csv',text_column='text',save_image=False)
vis_model.visualize()

📣 Community guidelines

We encourage and welcome contributions to the SIRITVIS package. If you have any questions, want to report bugs, or have ideas for new features, please file an issue.

Additionally, we appreciate pull requests via GitHub. There are several areas where potential contributions can make a significant impact, such as enhancing the quality of topics in topic models when dealing with noisy data from Reddit, Instagram or any external data sources, and improving the topic_mapper function to make it more interactive and independent from the notebook.

🖊️ Authors

Sagar Narwade
Gillian Kant
Benjamin Säfken
Benjamin Leiding

🎓 References

In our project, we utilised the "OCTIS" ¹ tool, a fantastic library by Terragni et al., which provided essential functionalities. Additionally, we incorporated the "pyLDAvis" ² by Ben Mabey Python library for interactive topic model visualisation, enriching our application with powerful data insights. The seamless integration of these resources significantly contributed to the project's success, offering an enhanced user experience and valuable research capabilities.

📜 License

Copyright (c) [2023] [Sagar Narwade] This software is released under the MIT License.

OCTIS. ↩
pyLDAvis ↩

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
SIRITVIS.egg-info		SIRITVIS.egg-info
SIRITVIS		SIRITVIS
build		build
dist		dist
images		images
paper		paper
paper_old		paper_old
sample_dataset		sample_dataset
test_notebook		test_notebook
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
workflow.yml		workflow.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIRITVIS

📋 Summary

📝 How to cite

Advisory

💡 Features

🛠 Installation

👩‍💻 Usage ([documentation])

Import Libraries

Streaming Reddit Data

Streaming Instagram Data

Clean Streamed Data or Any External Text Data

Train your a topic model on corpus of short texts

Topic Insights Visualisation

Trending Topic Geo Visualisation

📣 Community guidelines

🖊️ Authors

🎓 References

📜 License

About

Releases

Packages

Contributors 2

Languages

License

CodeEagle22/SIRITVIS

Folders and files

Latest commit

History

Repository files navigation

SIRITVIS

📋 Summary

📝 How to cite

Advisory

💡 Features

🛠 Installation

👩‍💻 Usage ([documentation])

Import Libraries

Streaming Reddit Data

Streaming Instagram Data

Clean Streamed Data or Any External Text Data

Train your a topic model on corpus of short texts

Topic Insights Visualisation

Trending Topic Geo Visualisation

📣 Community guidelines

🖊️ Authors

🎓 References

📜 License

Footnotes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages