DM-project1

This repository contains the code for the first project for Data Mining classes at the Poznań University of Technology. The code in this repository has been created by the team Kung Fu Pandas, consisting of:

The general purpose of the code in this repository is to preprocess the dataset to improve results attained by machine learning algorithms trained on this data. The dataset that we have chosen is Cars Dataset from 1970 to 2024, containing 10 attributes and over 90,000 records.

A more detailed description of our project can be found in our report.

Prequisities

To run the code from this repository, you need to have the following installed on your computer:

Python (version 3.10 or higher).

Additionally, to run code samples in the file samples.ipynb Jupyer Notebook is needed (the file can alternatively be opened in Google Colaboratory).

Setup

To download the respository to your local computer run the following command:

$ git clone https://github.com/MichalRedm/DM-project1.git

Then, you need to install all the Python dependencies:

$ pip install -r requirements.txt

Once this is done, you are ready to run the code.

Preprocessing the data

The file preprocess.py is responsible for processing the dataset. To generate a preprocessed dataset, run the following command:

$ python preprocess.py

The new dataset will be written to a file CarsDataProcessed.csv.

Testing the results

To test how well our preprocessing method works, test it against the baseline by running the code in grid_search.ipynb.

Code samples

To have some insight into how our function for data preprocessing operates, visit the file samples.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.gitignore		.gitignore
CarsData.csv		CarsData.csv
Kung_Fu_Pandas.pdf		Kung_Fu_Pandas.pdf
Kung_Fu_Pandas.zip		Kung_Fu_Pandas.zip
README.md		README.md
baseline_pipeline.py		baseline_pipeline.py
dataset_info.py		dataset_info.py
grid_search.ipynb		grid_search.ipynb
link_to_presentation.txt		link_to_presentation.txt
pipeline.py		pipeline.py
preprocess.py		preprocess.py
report.ipynb		report.ipynb
requirements.txt		requirements.txt
samples.ipynb		samples.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DM-project1

Prequisities

Setup

Preprocessing the data

Testing the results

Code samples

About

Releases

Packages

Contributors 4

Languages

MichalRedm/DM-project1

Folders and files

Latest commit

History

Repository files navigation

DM-project1

Prequisities

Setup

Preprocessing the data

Testing the results

Code samples

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages