Skip to content

Nihir2904/DailyDoseOfDadJokes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Jokes Clustering Project

This project aims to cluster jokes into different categories using unsupervised learning techniques.

Project Overview

In this project, I performed the following tasks:

  • Data Cleaning and Preprocessing
  • Feature Extraction using TF-IDF Vectorizer
  • Clustering using KMeans Algorithm
  • Visualization using Network Graph and Parallel Coordinates

Dataset

The dataset used in this project can be found in the dataset.csv. It contains a list of jokes in plain text format, thanks to Arya Shah for building this dataset.

Requirements

The project was implemented using Python 3.9.

How to Use

To run the project, simply run the your-dad-joked-once.ipynb file in the root directory. This will preprocess the data, perform clustering and topic modeling, and generate visualizations.

or simply click on this link and give the notebook a run: https://www.kaggle.com/code/nihirshah/your-dad-joked-once upvote if you believe in god and upvote if you don't.

Results

The KMeans algorithm was able to cluster the jokes into 5 different clusters of different atributes and features.

  • Explore other clustering algorithms and compare their performance
  • Topic Modeling using Latent Dirichlet Allocation (LDA)
  • Expand the dataset to include more jokes and categories

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published