# STINTSY MCO
The Major Course Output for STINTSY (Advanced Intelligent Systems) will include 11 sections. The following sections are:
- **Section 1** : Introduction to the problem/task and dataset
- **Section 2** : Description of the dataset
- **Section 3** : List of requirements
- **Section 4** : Data preprocessing and cleaning
- **Section 5** : Exploratory data analysis
- **Section 6** : Initial model training
- **Section 7** : Error analysis
- **Section 8** : Improving model performance
- **Section 9** : Model Performance Summary
- **Section 10** : Insights and conclusions
- **Section 11** : References

## Section 1 : Introduction 

Each group should select one real-world dataset from the list of datasets provided for the project. Each dataset is accompanied with a description file, which also contains detailed description of each feature.

The target task (i.e., classification or regression) should be properly stated as well.

## Section 2 : Description of Dataset
In this section of the notebook, you must fulfill the following:
- State a brief description of the dataset.
- Provide a description of the collection process executed to build the dataset. Discuss the implications of the data collection method on the generated conclusions and insights. Note that you may need to look at relevant sources related to the dataset to acquire necessary information for this part of the project.
- Describe the structure of the dataset file.
    - What does each row and column represent?
    - How many instances are there in the dataset?
    - How many features are there in the dataset?
    - If the dataset is composed of different files that you will combine in the succeeding steps, describe the structure and the contents of each file.
- Discuss the features in each dataset file. What does each feature represent? All features, even those which are not used for the study, should be described to the reader. The purpose of each feature in the dataset should be clear to the reader of the notebook without having to go through an external link.

## Section 3 : List of Requirements
List all the Python libraries and modules that you used.

## Section 4 : Data Preprocessing and Cleaning

Perform necessary steps before using the data. In this section of the notebook, please take note of the following:

- If needed, perform preprocessing techniques to transform the data to the appropriate representation. This may include binning, log transformations, conversion to one-hot encoding, normalization, standardization, interpolation, truncation, and feature engineering, among others. There should be a correct and proper justification for the use of each preprocessing technique used in the project.
- Make sure that the data is clean, especially features that are used in the project. This may include checking for misrepresentations, checking the data type, dealing with missing data, dealing with duplicate data, and dealing with outliers, among others. There should be a correct and proper justification for the application (or non-application) of each data cleaning method used in the project. Clean only the variables utilized in the study.

## Section 5 : Exploratory Data Analysis

Perform exploratory data analysis comprehensively to gain a good understanding of your dataset. In this section of the notebook, you must present relevant numerical summaries and visualizations. Make sure that each code is accompanied by a brief explanation. The whole process should be supported with verbose textual descriptions of your procedures and findings.

## Section 6 : Initial Model Training
Use machine learning models to accomplish your chosen task (i.e., classification or regression) for the dataset. In this section of the notebook, please take note of the following:
- The project should train and evaluate <u> at least 3 different kinds</u> of machine learning models. The models should not be multiple variations of the same model, e.g., three neural network models with different number of neurons.
- Each model should be appropriate in accomplishing the chosen task for the dataset. There should be a clear and correct justification on the use of each machine learning model.
- Make sure that the values of the hyperparameters of each model are mentioned. At the minimum, the optimizer, the learning rate, and the learning rate schedule should be discussed per model.
- The report should show that the models are not overfitting nor underfitting.

### Section 6.1 : K-Nearest Neighbor

### Section 6.2 : Linear Regression

### Section 6.3 : Logistic Regression

## Section 7 : Error Analysis
Perform error analysis on the output of all models used in the project. In this section of the notebook, you should:
- Report and properly interpret the initial performance of all models using appropriate evaluation metrics.
- Identify difficult classes and/or instances. For classification tasks, these are classes and/or instances that are difficult to classify. Hint: You may use confusion matrix for this. For regression tasks, these are instances that produces high error.

### Section 7.1 : Error Analysis for K-Nearest Neighbor

### Section 7.2 : Error Analysis for Linear Regression

### Section 7.3 : Error Analysis for Logistic Regression

## Section 8 : Improving Model Performance
Perform grid search or random search to tune the hyperparameters of each model. You should also tune each model to reduce the error in difficult classes and/or instances. In this section of the notebook, please take note of the following:
- Make sure to elaborately explain the method of hyperparameter tuning.
- Explicitly mention the different hyperparameters and their range of values. Show the corresponding performance of each configuration.
- Report the performance of all models using appropriate evaluation metrics and visualizations.
- Properly interpret the result based on relevant evaluation metrics.

### Section 8.1 : Improving K-Nearest Neighbor

### Section 8.2 : Improving Linear Regression

### Section 8.3 : Improving Logistic Regression

## Section 9 : Model Performance Summary
Present a summary of all model configurations. In this section of the notebook, do the following:
- Discuss each algorithm and the best set of values for its hyperparameters. Identify the best model configuration and discuss its advantage over other configurations.
- Discuss how tuning each model helped in reducing its error in difficult classes and/or instances.

## Section 10 : Insights and Conclusion
Clearly state your insights and conclusions from training a model on the data. Why did some models produce better results? Summarize your conclusions to explain the performance of the models. Discuss recommendations to improve the performance of the model.

## Section 11 : References
Cite relevant references that you used in your project. All references must be cited, including:
- Scholarly Articles – Cite in APA format and put a description of how you used it for your work.
- Online references, blogs, articles that helped you come up with your project – Put the website, blog, or article title, link, and how you incorporated it into your work.
- Artificial Intelligence (AI) Tools – Put the model used (e.g., ChatGPT, Gemini), the complete transcript of your conversations with the model (including your prompts and its responses), and a description of how you used it for your work.
