Data Cleaning and Correlation Analysis

This Python project aims to clean and analyze a dataset using pandas and seaborn libraries to perform correlation analysis. The dataset used for this project is "movies.csv".

Getting Started

To run this project, ensure you have the required libraries installed, such as pandas, seaborn, numpy, and matplotlib.

Loading the Data

use this line of code to load data depending on the file location df = pd.read_csv(r"C:\Users\gorja\Downloads\python\movies.csv")

Data Cleaning

Checking for missing data and data types present in the columns. Changing the data types for the 'budget', 'gross', and 'votes' columns. Creating a new column 'year_correct' by extracting the year from the 'released' column. Sorting the 'gross' column in ascending values. Dropping any duplicate rows.

Data Visualization

Scatter plot of 'budget' vs 'gross' earnings. Scatter plot using seaborn library. Heatmap of the correlation matrix. Heatmap of the correlation matrix for all columns.

Numerizing Columns

Converting object type columns to numeric using categorical codes.

Correlation Analysis

Calculating the correlation matrix and identifying high correlated pairs.

Conclusion

Based on the correlation analysis, it was found that 'votes' and 'budget' have the highest correlation to 'gross' earnings. The project also provides insights into other correlations within the dataset.

Feel free to explore and modify the code according to your requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Cleaning and Correlation in Python.ipynb		Cleaning and Correlation in Python.ipynb
README.md		README.md
movies.csv		movies.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Cleaning and Correlation Analysis

Getting Started

Loading the Data

Data Cleaning

Data Visualization

Numerizing Columns

Correlation Analysis

Conclusion

Note: Make sure to have the necessary dependencies installed and the dataset path updated before running the code.

About

Uh oh!

Releases

Packages

Languages

Jayson-gor/Data_Cleaning_Correlation_Using_Python

Folders and files

Latest commit

History

Repository files navigation

Data Cleaning and Correlation Analysis

Getting Started

Loading the Data

Data Cleaning

Data Visualization

Numerizing Columns

Correlation Analysis

Conclusion

Note: Make sure to have the necessary dependencies installed and the dataset path updated before running the code.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages