# Introduction to Machine Learning

![image.png](attachment:image.png)

## Artificial Intelligence (AI)

> Artificial Intelligence (AI) is a broad field that encompasses the development of computer systems capable of performing tasks that typically require human intelligence. The goal of AI is to create intelligent machines that can perceive, learn, reason, and take actions to achieve specific goals. AI systems can perform tasks such as speech recognition, image recognition, decision-making, and problem-solving, among others.


## Machine Learning

> Machine Learning (ML) is a subset of AI that focuses on the development of algorithms and statistical models that enable computer systems to learn from data and improve their performance on a specific task over time, without being explicitly programmed. Machine Learning algorithms can automatically identify patterns in data and use those patterns to make accurate predictions or decisions without relying on predetermined rules or instructions.


## Deep Learning

> Deep Learning is a specialized subset of Machine Learning that is inspired by the structure and function of the human brain, specifically the neural networks. Deep Learning algorithms, also known as artificial neural networks, are composed of multiple layers of interconnected nodes (artificial neurons) that can learn and extract features from large amounts of data. Deep Learning has been particularly successful in areas such as image and speech recognition, natural language processing, and computer vision.


## Data Science

> Data Science is an interdisciplinary field that combines statistics, mathematics, computer science, and domain-specific knowledge to extract insights and knowledge from structured and unstructured data. Data scientists use various techniques, including Machine Learning, to collect, process, analyze, and interpret data in order to uncover patterns, trends, and relationships that can inform decision-making and drive business value.

# Differences:

#### Scope: 

- AI is the broadest field, encompassing the development of intelligent systems capable of performing human-like tasks. Machine Learning and Deep Learning are subfields of AI that focus on specific techniques and algorithms for enabling systems to learn and make predictions from data. Data Science is a broader discipline that incorporates Machine Learning, as well as other techniques, to extract insights from data.

#### Approach: 

- AI systems can be developed using various approaches, including rule-based systems, knowledge-based systems, and Machine Learning. Machine Learning and Deep Learning specifically focus on algorithms that can learn from data, with Deep Learning being a more advanced and specialized approach within Machine Learning.

#### Data Dependency: 

- While AI systems can be developed without relying heavily on data, Machine Learning and Deep Learning are inherently data-driven, meaning they require large amounts of data to train their models effectively. Data Science also heavily relies on data and its analysis, but it incorporates other techniques beyond Machine Learning.

#### Applications: 

- AI has a wide range of applications, including robotics, natural language processing, expert systems, and more. Machine Learning and Deep Learning are particularly suited for tasks involving pattern recognition, prediction, and decision-making based on data. Data Science has applications across various domains, including business intelligence, marketing, finance, healthcare, and scientific research.

# Concept to Cover

* core concepts of machine learning
* the history of ML
* ML and fairness
* regression ML techniques
* classification ML techniques
* clustering ML techniques
* natural language processing ML techniques
* time series forecasting ML techniques
* reinforcement learning


# What we will not cover

* deep learning
* neural networks
* AI

## Examples of Machine Learning

You can use machine learning in many ways:

* To predict the likelihood of disease from a patient's medical history or reports.
* To leverage weather data to predict weather events.
* To understand the sentiment of a text.
* To detect fake news to stop the spread of propaganda.

Almost all fields are now using machine learning to make predictions and decisions.

> Finance, economics, earth science, space exploration, biomedical engineering, cognitive science, and even fields in the humanities have adapted machine learning to solve the arduous, data-processing heavy problems of their domain.




# A Brief History of Machine Learning

![image.png](attachment:image.png)

- **1950s**: The concept of Artificial Intelligence was introduced, and the first AI programs, such as the Logic Theorist and the Neural Network Simulator, were developed.

- **1959**: Arthur Samuel coined the term "Machine Learning" while working on a program that could learn to play checkers.

- **1967**: The Nearest Neighbor algorithm was introduced, laying the foundation for modern pattern recognition techniques.

- **1970s-1980s**: Machine Learning research focused on areas like decision trees, neural networks, and genetic algorithms.

- **1986**: The backpropagation algorithm for training neural networks was rediscovered, leading to a resurgence of interest in neural networks.

- **1990s**: The development of Support Vector Machines (SVMs) and the emergence of kernel methods provided powerful tools for classification and regression tasks.

- **1997**: The Netflix Prize competition sparked significant interest in collaborative filtering and recommender systems.

- **2006**: The term "Deep Learning" was introduced, referring to the use of deep neural networks with multiple hidden layers.

- **2012**: AlexNet, a deep convolutional neural network, achieved breakthrough performance in the ImageNet Large Scale Visual Recognition Challenge, marking the beginning of the deep learning revolution.

- **Present**: Machine Learning and Deep Learning have become widespread, with applications in various domains, including computer vision, natural language processing, robotics, and healthcare.

## AI winter

- **AI Winter (1970s-1980s)**: A period of reduced funding and interest in AI research due to the failure of early AI systems to meet overly ambitious expectations and the lack of computational power.

- **AI Winter (1987-1993)**: Another period of decreased interest and funding in AI research, caused by the limitations of expert systems and the high cost of developing AI systems.

    - Limitations. Compute power was too limited.
    - Combinatorial explosion. The amount of parameters needed to be trained grew exponentially as more was asked of computers, without a parallel evolution of compute power and capability.
    - Paucity of data. There was a paucity of data that hindered the process of testing, developing, and refining algorithms.




## 1993 - 2011

- Advent of big data era
  - Rapid increase in data availability
  - Widespread adoption of smartphones around 2007
  - Access to large amounts of data for training ML/AI systems

- Exponential growth in computational power
  - Enabled processing of large datasets
  - Facilitated complex calculations and algorithms

- Advancements in algorithms and techniques
  - Evolution of sophisticated algorithms
  - Capable of solving previously intractable problems

- Confluence of factors enabling practical applications
  - Availability of data, computing power, and advanced algorithms
  - Widespread adoption of ML and AI solutions


## Now


Today machine learning and AI touch almost every part of our lives.

![image.png](attachment:image.png)

# Fairness in Machine Learning


### What is fairness in ML?

![image.png](attachment:image.png)

> <span style="color: yellow;">Criminal Justice Systems:</span>:  One of the most cited examples of algorithmic bias affecting Black individuals is in predictive policing and recidivism prediction tools. Studies have shown that algorithms used to predict future criminal behavior are more likely to falsely flag Black defendants as higher risks for reoffending compared to their White counterparts. This bias can influence sentencing, bail, and parole decisions, disproportionately impacting Black communities.

- Refrence book: Machine See, Machine Do: How Technology Mirrors Bias in Our Criminal Justice System



- Refrence: https://fairmlbook.org

> <span style="color: yellow;">Facial Recognition Technologies:</span> Black individuals have been disproportionately affected by inaccuracies in facial recognition technologies. Studies have shown that these systems are less accurate at identifying Black faces compared to White faces, leading to higher rates of misidentification. This can have serious consequences, including wrongful arrests and surveillance, reinforcing systemic biases in law enforcement and security.

![image.png](attachment:image.png)

- [MIT Researcher Exposing Bias in Facial Recognition Tech Triggers Amazon’s Wrath](https://www.insurancejournal.com/news/national/2019/04/08/523153.htm)


![image.png](attachment:image.png)

> Google Photos App Tags Dark Skinned People As Gorillas

>  <span style="color: yellow;">Healthcare Algorithms:</span> Research has uncovered racial bias in algorithms used to prioritize patients for healthcare services, such as programs for managing chronic diseases. One study found that an algorithm widely used in the U.S. healthcare system was less likely to refer Black people than White people who were equally sick to programs that aim to improve care for patients with complex medical needs. 


> <span style="color: yellow;">Employment and Hiring Tools::</span> Automated systems used to screen job applicants can perpetuate biases against Black candidates. If these algorithms are trained on data reflecting historical hiring practices or societal biases, they may favor resumes from demographic groups that are overrepresented in certain industries, disadvantaging Black applicants

> 
<span style="color: yellow;"> Fairness in machine learning refers to the goal of creating algorithms that make decisions without bias towards any individual or group, especially those defined by protected attributes such as race, gender, age, or sexual orientation. 



#### Lesson 

- Raise your awareness of the importance of fairness in machine learning.
- Learn about fairness-related harms.
- Learn about unfairness assessment and mitigation.

## Fairness-related harms

> Allocation, if a gender or ethnicity for example is favored over another: <span style="color: yellow;">
  Consider a hypothetical system for screening loan applications. The system tends to pick white men as better candidates over other groups. As a result, loans are withheld from certain applicants.



> Quality of service. If you train the data for one specific scenario but reality is much more complex, it leads to a poor performing service.

> Stereotyping. Associating a given group with pre-assigned attributes.

> Denigration. To unfairly criticize and label something or someone.

> Over- or under- representation. The idea is that a certain group is not seen in a certain profession, and any service or function that keeps promoting that is contributing to harm.

#### Can we have fairness in ML?
Guaranteeing fairness in AI and machine learning remains a complex sociotechnical challenge. It cannot be addressed from either purely social or technical perspectives.


Achieving fairness in machine learning is challenging and requires a multifaceted approach, including:

- Bias Detection and Mitigation Techniques: Identifying and reducing bias in data and model predictions.

- Diverse Datasets: Including a wide range of data from various groups to improve representation.

- Fairness Metrics: Employing metrics to evaluate fairness across different groups.

- Inclusive Development Processes: Incorporating perspectives from diverse groups in the development and deployment of machine learning systems.

# Techniques of Machine Learning


1.  <span style="color: yellow;"> Decide on the question:</span>:  Most ML processes start by asking a question that cannot be answered by a simple conditional program or rules-based engine. These questions often revolve around predictions based on a collection of data.

2. <span style="color: yellow;"> Collect and prepare data:</span>:  To be able to answer your question, you need data. The quality and, sometimes, quantity of your data will determine how well you can answer your initial question. Visualizing data is an important aspect of this phase. This phase also includes splitting the data into a training and testing group to build a model.


3. <span style="color: yellow;"> Choose a training method.:</span>: Depending on your question and the nature of your data, you need to choose how you want to train a model to best reflect your data and make accurate predictions against it. This is the part of your ML process that requires specific expertise and, often, a considerable amount of experimentation.

4. <span style="color: yellow;">Train the model </span>:  Using your training data, you'll use various algorithms to train a model to recognize patterns in the data. The model might leverage internal weights that can be adjusted to privilege certain parts of the data over others to build a better model.

5. <span style="color: yellow;"> Evaluate the model. </span>:  You use never before seen data (your testing data) from your collected set to see how the model is performing.
Parameter tuning. Based on the performance of your model, you can redo the process using different parameters, or variables, that control the behavior of the algorithms used to train the model.
Predict. Use new inputs to test the accuracy of your model.

![image.png](attachment:image.png)

## Question?

- Who wwill win the election?


#

# Data ?

- How to collect data?

- How to make sure the data is not biased?

- How to make sure the data is not skewed?

- How to make sure the data is not imbalanced?

- How to make sure the data is not missing?

- How to make sure the data is not noisy?

- How to make sure the data is not irrelevant?


## Prepare data

- There are several steps in the data preparation process.
- You might need to collate data and normalize it if it comes from diverse sources.
- Handle missing data by either removing those rows or imputing missing values.
- Convert categorical data to numerical format if required by the machine learning model.
- Scale numerical data to a common range, such as 0 to 1, to prevent features with larger values from dominating.
- Split the data into training and test sets, with the test set held back to evaluate the final model's performance.
  

### Features and Target

- In machine learning, we aim to build models that can learn patterns from data and make predictions or decisions based on those patterns. 

- To do this, we need to provide the model with input data (features) and the corresponding output data (target) that we want the model to predict.

- Loc
- Number of rooms
- plot size
- Years built
- toilet
  
  
- Price



![image.png](attachment:image.png)


### **Features**:
Features are the independent variables or attributes that describe the characteristics of the data. They are the inputs that the machine learning model will use to learn patterns and make predictions. For example:

1. In a housing price prediction model, features could include the size of the house, number of bedrooms, location, age of the property, etc.
2. In a spam email detection model, features could be the presence of certain words or phrases, the sender's email address, the subject line, etc.
3. In a customer churn prediction model, features could be the customer's subscription plan, usage patterns, customer service interactions, and demographic information.


### **Target**:

The target is the dependent variable or the output that we want the machine learning model to predict based on the input features. It is the variable we are interested in predicting or estimating. For example:

1. In the housing price prediction model, the target would be the actual selling price of the house.
2. In the spam email detection model, the target would be a binary classification of whether an email is spam or not spam.
3. In the customer churn prediction model, the target would be whether a customer is likely to churn (cancel their subscription) or not.

The machine learning model tries to learn the relationship between the features and the target variable from the training data. Once the model is trained, it can then take new sets of input features and make predictions about the corresponding target variable.