# Business Intelligence Fall 2023 Exam Project

This project is designed as experimental research and development of BI implementation solution. It involves systematic and creative work of finding novel, uncertain, and reproducible results by applying modern BI and artificial intelligence (AI) technologies in a context. 

The development workflow goes through four stages and milestones, each of which has an objective, tasks,
and deliverables.

## Stage 1: Problem Definition
### Objective: Foundation of a business case and problem statement

https://www.kaggle.com/datasets/arnabchaki/data-science-salaries-2023/data

1. At this stage you brainstorm, browse sources of inspiration and information, collect ideas and discuss business or social domains, where BI and AI can bring a value.
2. Choose one of your ideas and define context, purpose, research questions, and hypotheses for a BI problem statement. Write a brief annotation of your project, in about four sentences, explaining:
     - which challenge you would like to address?
     - why is it important or interesting research goal?
     - what is the expected solution your project would provide?
     - what would be the impact of the solution and which category of users could benefit from it?
3. Prepare the development environment
     - give a title to your project
     - plan and organise the execution of the individual tasks in terms of time, milestones, deliverables, team members engagement
     - prepare the development platform and procedures – Github repository, IDE, software tools
4. Create and upload the initial project document with the information from above in a .md file in your repository as an initial release of the project. 

## Stage 2: Data Preparation
### Objective: Data collection, exploration and pre-processing
Based on the ideas and assumptions defined at the previous stage:
1. Collect and load relevant data from various sources
2. Clean and integrate the collected data in appropriate data structures. Apply any transformations needed for the integration and the operations - ETL (Extract Transform Load) or ELT (Extract Load Transform).
3. Explore the data by applying measures from statistics to discover its basic features. Create charts and diagrams to visualize the features for better understanding and support of further decisions.
4. Apply the necessary pre-processing to prepare the data for machine learning analysis, ensuring that the data is:
    - Meaningful – describes relevant and correctly measured features and observations.
    - Sufficient – describes various cases and feature occurrences, decided by testing.
    - Shaped – presented in a structure, appropriate for processing by machine learning algorithms.
    - Cleaned – repaired from missing values and outliners.
    - Scaled – transform data distributions in comparable scales, when necessary.
    - Engineered – analyse all features and select the most informative for further processing.
    
Export your initial version of the solution to the Github repository.

## Stage 3: Solution Prototype
### Objective: Using data and analysis for building predictive models
Extend the data analysis by implementing machine learning and deep learning methods and algorithms.
1. Select relevant methods that could solve the problem. Train, test and validate data models by use of
supervised and unsupervised methods, neural networks or graphs.
2. Select and apply appropriate measures for assessing the quality of your models. Iterate the process to
explore possibilities for improvement the quality of the models.

Publish the new version of your solution in Github as a prototype.

## Stage 4: Visualisation, Explanation and Usability Evaluation
### Objective: Present the process and the results of the analysis in human-understandable form
Extend your solution with visualisation, explanation and interpretation of the results:
1. Design and develop visual representation of the data, the analysis process, the applied methods and usage
scenarios. Consider the use of animation, 3D or VR visualisation, as appropriate.
2. Create a simple visual interface of the application to make it accessible and interactive for other users.
3. Present the visualised prototype to potential users for usability evaluation. Take notes and implement the
relevant feedback outcomes. Elaborate on the benefits of applying visualisation and explanation
techniques for data analytics.

Revise, complete, and deliver the final solution to Github and a link to it in Wiseflow.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [5]:
!pwd

/Users/emiliocastrolagunas/Documents/GitHub/BI-Fall-2023-Exam-Project


In [7]:
data = pd.read_csv('/../GitHub/BI-Fall-2023-Exam-Project/Data/ds_salaries2.csv')
data.head()

FileNotFoundError: [Errno 2] No such file or directory: '../GitHub/BI-Fall-2023-Exam-Project/Data/ds_salaries2.csv'

work_year - The year the salary was paid.

experience_level - The experience level in the job during the year with the following possible values: EN Entry-level / Junior MI Mid-level / Intermediate SE Senior-level / Expert EX Executive-level / Director

employment_type - The type of employement for the role: PT Part-time FT Full-time CT Contract FL Freelance

job_title - The role worked in during the year.

salary - The total gross salary amount paid.

salary_currency - The currency of the salary paid as an ISO 4217 currency code.

salary_in_usd - The salary in USD (FX rate divided by avg. USD rate for the respective year via fxdata.foorilla.com).

employee_residence - Employee's primary country of residence in during the work year as an ISO 3166 country code.

remote_ratio - The overall amount of work done remotely, possible values are as follows: 0 No remote work (less than 20%) 50 Partially remote 100 Fully remote (more than 80%)

company_location - The country of the employer's main office or contracting branch as an ISO 3166 country code.

company_size - The average number of people that worked for the company during the year: S less than 50 employees (small) M 50 to 250 employees (medium) L more than 250 employees (large)