Methodology For Data Narrative
- Data Exploration: We will begin by importing the dataset using pandas and cleaning any missing values or inconsistencies.
- Question Formulation: We will formulate a series of scientific questions to investigate various aspects of the data.
- Data Analysis: Unveiling patterns & insights from data through exploration & analysis.
- Reporting and Interpretation: We will present our findings in a clear and concise manner. This will include the formulated questions, applied methods (code snippets), and generated visualizations (plots, tables) to support our observations.
This project delves into the world of books using the Goodreads-10k dataset, a rich collection of information on 10,000 books. We leverage the power of Python libraries like pandas to explore various scientific questions and uncover hidden trends within this dataset.
Our Goals:
- Uncover insights into book popularity and user preferences based on average rating and genre.
- Analyze publishing trends across different historical periods.
- Identify potential relationships between book categories and average ratings.
- Explore the role of tags in user interest and discoverability.
Outcome:
- Uniqueness and Novelty: This project aims to be unique by:
- Focus on User Preferences: Combining average rating analysis with tag exploration to understand what users find appealing.
- Comparative Analysis: Comparing publishing trends across different eras to identify potential shifts in reader interests.
The resulting data narrative will offer valuable insights into the world of books. It can benefit authors, publishers, and book-recommendation platforms by providing a data-driven understanding of popular genres, historical publishing trends, and the role of user tags in book discoverability.
Data Narrative on the two datasets on US Colleges (aaup and usnews)
I. Dataset 1: College Professor Statistics
- This dataset provides statistics on the number of different types of professors employed by a specific college in a specific state, along with their typical salaries.
- The data can be utilized for various purposes such as calculating the professor to student ratio, analyzing faculty diversity, comparing college statistics, and more.
II. Dataset 2: US University Data
- This dataset contains comprehensive information on universities in the US, including their FICE code, name, state, public/private status, SAT and ACT scores, enrollment details, tuition fees, room and board costs, and more.
- It can be used for tasks like trend analysis, comparison of college statistics, data-driven decision making for educational institutions, and assessing the overall landscape of higher education in the US.
- Combine and analyze both datasets to gain insights into the education sector.
- Explore trends and patterns in college and university statistics.
- Perform comparative analysis across colleges and universities.
- Utilize the data for data-driven decision making in education policy and institutional management.
- This dataset contains detailed information on matches played in the four major tennis tournaments.
- It includes various data points such as player names, match results, and diverse match statistics like first serve percentage, aces, winners, and more.
- With a total of 50 columns, the dataset offers comprehensive insights into tennis match dynamics and player performance.
- Analyze match trends and patterns across different tournaments.
- Investigate factors contributing to match outcomes and player success.
- Explore correlations between match statistics and player rankings.
- Utilize the data to develop predictive models for match outcomes or player performance.
- Dimensionality Reduction for K-means Clustering: A Comparative Analysis Using PCA
- Evaluating Gaussian Naive Bayes for Digit Classification with Error Analysis
- Exploring K-Means Clustering for Handwritten Digits
- Impact of Noise on K-Means Clustering and the Role of PCA
- Implementing and Evaluating a Custom K-Nearest Neighbors Classifier
Mentor : Prof. Shanmughanathan Raman, IIT Gandhinagar