Skip to content

This project involved exploring object classification using machine learning models from scikit-learn, with a focus on dimensionality reduction to simplify data complexity. Additionally, a comprehensive data narrative was created to illustrate the process and insights gained, utilizing Python libraries for data analysis and visualization.

Notifications You must be signed in to change notification settings

Mrugank97/PyNarrative

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyNarrative

Methodology For Data Narrative

  1. Data Exploration: We will begin by importing the dataset using pandas and cleaning any missing values or inconsistencies.
  2. Question Formulation: We will formulate a series of scientific questions to investigate various aspects of the data.
  3. Data Analysis: Unveiling patterns & insights from data through exploration & analysis.
  4. Reporting and Interpretation: We will present our findings in a clear and concise manner. This will include the formulated questions, applied methods (code snippets), and generated visualizations (plots, tables) to support our observations.

Data Narrative 1

Project Description: Unveiling Book Trends and User Preferences with Goodreads Data

This project delves into the world of books using the Goodreads-10k dataset, a rich collection of information on 10,000 books. We leverage the power of Python libraries like pandas to explore various scientific questions and uncover hidden trends within this dataset.

Our Goals:

  • Uncover insights into book popularity and user preferences based on average rating and genre.
  • Analyze publishing trends across different historical periods.
  • Identify potential relationships between book categories and average ratings.
  • Explore the role of tags in user interest and discoverability.

Outcome:

  • Uniqueness and Novelty: This project aims to be unique by:
    • Focus on User Preferences: Combining average rating analysis with tag exploration to understand what users find appealing.
    • Comparative Analysis: Comparing publishing trends across different eras to identify potential shifts in reader interests.

The resulting data narrative will offer valuable insights into the world of books. It can benefit authors, publishers, and book-recommendation platforms by providing a data-driven understanding of popular genres, historical publishing trends, and the role of user tags in book discoverability.

Data Narrative 2:

Project Description: Unveiling Education Trends

Data Narrative on the two datasets on US Colleges (aaup and usnews)

I. Dataset 1: College Professor Statistics

  • This dataset provides statistics on the number of different types of professors employed by a specific college in a specific state, along with their typical salaries.
  • The data can be utilized for various purposes such as calculating the professor to student ratio, analyzing faculty diversity, comparing college statistics, and more.

II. Dataset 2: US University Data

  • This dataset contains comprehensive information on universities in the US, including their FICE code, name, state, public/private status, SAT and ACT scores, enrollment details, tuition fees, room and board costs, and more.
  • It can be used for tasks like trend analysis, comparison of college statistics, data-driven decision making for educational institutions, and assessing the overall landscape of higher education in the US.

Project Goals

  • Combine and analyze both datasets to gain insights into the education sector.
  • Explore trends and patterns in college and university statistics.
  • Perform comparative analysis across colleges and universities.
  • Utilize the data for data-driven decision making in education policy and institutional management.

Data Narrative 3:

Project Description: Tennis Major Tournament Match Statistics

  • This dataset contains detailed information on matches played in the four major tennis tournaments.
  • It includes various data points such as player names, match results, and diverse match statistics like first serve percentage, aces, winners, and more.
  • With a total of 50 columns, the dataset offers comprehensive insights into tennis match dynamics and player performance.

Project Goals

  • Analyze match trends and patterns across different tournaments.
  • Investigate factors contributing to match outcomes and player success.
  • Explore correlations between match statistics and player rankings.
  • Utilize the data to develop predictive models for match outcomes or player performance.

Mini Projects

  1. Dimensionality Reduction for K-means Clustering: A Comparative Analysis Using PCA
  2. Evaluating Gaussian Naive Bayes for Digit Classification with Error Analysis
  3. Exploring K-Means Clustering for Handwritten Digits
  4. Impact of Noise on K-Means Clustering and the Role of PCA
  5. Implementing and Evaluating a Custom K-Nearest Neighbors Classifier

Mentor : Prof. Shanmughanathan Raman, IIT Gandhinagar

About

This project involved exploring object classification using machine learning models from scikit-learn, with a focus on dimensionality reduction to simplify data complexity. Additionally, a comprehensive data narrative was created to illustrate the process and insights gained, utilizing Python libraries for data analysis and visualization.

Topics

Resources

Stars

Watchers

Forks