A project using unsupervised and supervised learning techniques to analyze the level of misinformation in articles. Clusters unlabeled articles into 10 groups based on content, trains a model on a pre-labeled dataset to classify articles as "real" or "fake," and runs the model on the 10 groups for analysis.
In order to properly use this project, the dataset files must be downloaded separately and moved to the /unsupervised/datasets/ and /supervised/datasets/ folders. These files are too large to be hosted on Github, and can instead be found here: Google Drive.
They can also be found at their original sources on Kaggle: Unsupervised, Supervised