![](../images/logos/KIEPSKIES.jpg)

# Foundations of Data Science
## 🚀 Welcome to Data Science

Hey there! 👋 Ready to dive into the world of **Data Science**? Whether you're a beginner or have some experience, this introduction will give you a solid foundation in the key concepts, terminology, and real-world applications. So, grab your favorite drink ☕ and let’s get started!

______________________________________________________________________________________________________

### 🎯 What is Data Science?

Imagine you have a **huge pile of raw data**—numbers, text, images, or even sounds. Data Science is all about **making sense of that data** to uncover patterns, insights, and trends that can drive decisions.

It’s a mix of:

- **Mathematics & Statistics** – To analyze and interpret data
- **Programming** – To manipulate, process, and visualize data
- **Domain Knowledge** – To apply insights to real-world problems
- **Machine Learning (AI)** – To build predictive models that learn from data

In simple terms, Data Science helps turn messy data into meaningful stories! 📊📖

_________________________________________________________________________

### 🔑 Key Terminology in Data Science
Here are some must-know terms you'll hear often:

📌 **Dataset**

A collection of data in a structured format (e.g., a table with rows and columns). Think of an Excel sheet full of customer purchases.

📌 **Features & Labels**
- Features: The characteristics or attributes of the data (e.g., age, income, product purchased).
- Label (Target Variable): The outcome we’re trying to predict (e.g., will a customer buy again?).

📌 **Structured vs. Unstructured Data**

- Structured: Organized in tables (like spreadsheets or databases).
- Unstructured: Free-form data like images, audio, and text.

📌 Big Data

Extremely large datasets that require special tools (like Hadoop or Spark) to process. Think **Google searches, social media feeds, or Netflix recommendations!**

📌 Machine Learning (ML)

A subset of AI that enables computers to learn patterns from data and make predictions without being explicitly programmed.

📌 Artificial Intelligence (AI)

The broader field of machines performing tasks that typically require human intelligence (e.g., speech recognition, recommendation systems).

📌 Data Cleaning

The process of removing or correcting errors, missing values, and inconsistencies in data. Garbage in, garbage out! If your data is messy, your results will be too.

📌 Data Visualization

Turning data into charts, graphs, and dashboards for better understanding. **A picture is worth a thousand numbers!** 📊🎨

___________________________________________________________________________________

### 🌎 How is Data Science Used in the Real World?

You see Data Science in action **every single day**—even if you don’t realize it! Here are some cool real-world applications:

🎥 **Netflix & YouTube Recommendations**

Ever wondered how Netflix always knows what you’d love to watch next? **Data Science + Machine Learning** analyze your viewing history to suggest movies & shows.

🛒 **E-Commerce (Amazon, Jumia, Alibaba)**

- Personalized product recommendations 🏷️
- Predicting which products will be in demand 📈
- Detecting fraudulent transactions 🔍

🏥 **Healthcare**

- Diagnosing diseases from medical images 🏥
- Predicting patient readmission rates 📉
- Personalized treatment plans 💊

🚗 **Self-Driving Cars (Tesla, Waymo)**

Using sensors & ML to detect objects, recognize road signs, and make real-time driving decisions.

💰 **Finance & Fraud Detection**

Banks and fintech companies use AI to detect unusual transactions, preventing fraud before it happens! 🚨

______________________________________________________________________________

### 🏆 Why Learn Data Science?

If you're wondering whether Data Science is the right path for you, here are **3 reasons why it’s an amazing skill to have:**

- **High Demand:** Every industry—from healthcare to finance—needs Data Scientists. The job market is booming! 🚀
- **Great Earning Potential:** Data Science roles offer **high salaries** because skilled professionals are in demand. 💰
- **Problem-Solving Power:** You get to **solve real-world problems** and make a real impact!

____________________________________________________________________________________

### 💡 Your Data Science Journey Starts Here!

Data Science is a fascinating field with endless possibilities! In this course, we’ll cover:

🚀 How to set up your Python environment 🐍
📊 Data wrangling and visualization techniques
🧠 Introduction to Machine Learning models
🛠️ Hands-on projects to apply your skills

Stay curious, ask questions, and have fun along the way! Ready to become a Data Science pro? Let’s go! 🎯🔥

______________________________________________________________________________________

![](../images/logos/python.jpeg)
## Python 

Python is a versatile, high-level programming language known for its simplicity and readability, widely used in web development, data analysis, automation, and more

### 1. Why Learn Python?

Python is one of the most popular programming languages in the world, and here's why you should consider learning it:

- **Easy to learn:** Python’s syntax is simple and readable, making it a great choice for beginners.
- **Versatile:** Python is used in many fields—data analysis, web development, machine learning, automation, and more!
- **Growing demand:** Python skills are highly sought after in the job market, from tech companies to startups.
- **Great community:** You'll find a wealth of resources and support online as the Python community is vast and active.

______________________________________________________________________________________________________________________

### 2. What Can Python Do? 

Python can do a lot! Here are just a few examples:

- **Data Analysis:** 'Crunch numbers', analyze trends, and visualize data.
- **Web Development:** Build websites with frameworks like Django or Flask.
- **Automation:** Write scripts to automate repetitive tasks.
- **Machine Learning & AI:** Create intelligent systems that learn from data.
- **Scripting & Software Development:** Develop applications from simple tools to complex systems.

_________________________________________________________________________________________________________________________________________

### 3. Installation of Python and necessary toolkits

Before we start coding, we need to set up your environment. Follow these steps to install Python and its essential tools:

1. **Install Python:**

- Go to Python’s official website.You can find it [here](http://python.org/)
- Download the latest version for your operating system.
- Run the installer and make sure to check the box that says “Add Python to PATH.”

2. **Install a Code Editor:**

For this course, we recommend **VS Code** and **Jupyter Notebook**.
VS Code: Download from [here](https://code.visualstudio.com/). *Remember to install python extension to run python*

Jupyter Notebook: You can install it by typing `pip install notebook` in your terminal or command prompt. Alternatively, it can be installed by `pip3 install jupyter`

<span style="color: green;">**Practical exercise**</span>

Lets install python and its toolkit

- install python (windows, Mac and Linux)
- install vs code. Will be used to discuss python fundamentals
- finally install `jupyter notebook`

Write our first `"Hello World program"`

#### <span style="color: #00008B;">First Python Program</span> 

Let’s dive into some basic but super important Python features: the `print()` statement and comments. Ready? Let's go!

**What is the print() statement?**

The `print()` statement is how you tell Python to display something on the screen. It’s the simplest way to interact with your code—kind of like telling Python, “Hey, show me this!”

Here's an example:

In [1]:
print("Hello, world!")

Hello, world!


<span style="color: orange;">**Try it out:**</span> 

Open your Python environment and type this in! What will you print to the screen? Maybe your name or a fun fact about yourself?

In [2]:
print("Hello Mustafa!")

Hello Mustafa!


**Comments**

Comments are lines in your code that Python ignores when running your program. They’re super useful for explaining what your code does (and leaving little notes for your future self or others!).

In Python, comments start with a `#` symbol. Like this:

In [3]:
# This is a comment and won't be executed

You see! There are no results. Python ignored it Lets try integrating comments with other lines of code

In [4]:
# This is a comment and won't be executed
print("Python is fun!")  # This prints a message

Python is fun!


**Notice the position of the `#` sign**

Comments are used to remind yourself why you wrote a certain line of code or to disable certain parts of your code temporarily.

<span style="color: green;">**Practical exercise**</span>

 Write a small program that prints your name and includes a comment describing what the code does. For example:

In [5]:
# Solution

________________________________________________________________________________________________________________

### <span style="color: #00008B;">The Building Blocks of Python 🧱</span> 

Now that you've mastered printing and commenting, let’s learn about Python data types and operators. These are the fundamental tools you'll use to work with data in Python. Thats why the ya referred to as the **Building Blocks!**

### <span style="color: #00008B;">Basic Data Types</span> 

1. **Strings `(str)`:** Think of strings as text—anything inside quotes is a string.

In [6]:
name = 'Mustafa'
greeting = "Hello, World!"

Strings can be inside single quotation marks(`''`) or double quotation marks(`""`). 

<span style="color: orange;">**Try it:**</span> : Write your own string and print it! What's your favorite quote or phrase?

2. **Integers `(int)`:** Whole numbers, no decimal point.

In [7]:
age = 25
apples = 5

<span style="color: orange;">**Try it:**</span> : Declare an integer and use print() to display it. 

3. **Float `(float)`:** Numbers with a decimal point.

In [8]:
price = 19.99
weight = 72.5

<span style="color: orange;">**Try it:**</span> : Store and print a float.

4. **Booleans (bool):** Only `True` or `False`. Booleans are used in logical comparisons.

In [9]:
is_student = True
is_sunny = False

<span style="color: orange;">**Try it:**</span> :Create your own boolean variable and print it. Is today a weekday? 😄
____________________________________________________________________________________________
### <span style="color: #00008B;">Operators</span> 

Operators allow you to manipulate and compare data. Let’s break it down:

1. **Arithmetic Operators:** These let you perform calculations with numbers.

- `+` (Addition): `5 + 3 → 8`
- `-` (Subtraction): `10 - 6 → 4`
- `*` (Multiplication): `4 * 2 → 8`
- `/` (Division): `15 / 3 → 5.0`
- `**` (Exponentiation): `2 ** 3 → 8` (2 raised to the power of 3)


<span style="color: green;">**Practical Activity**</span> 

Try adding, subtracting, multiplying, and dividing numbers! For example:

In [10]:
print(10 + 5)  # Add numbers
print(9 / 2)   # Divide numbers

15
4.5


2. **Comparison Operators:** These compare values and return True or False.

- `==` (Equal to): `5 == 5 → True`
- `!=` (Not equal to): `3 != 4 → True`
- `>` (Greater than): `7 > 2 → True`
- `<` (Less than): `3 < 5 → True`

<span style="color: green;">**Practical Activity**</span>  

Use comparison operators to check if your age is greater than a number or if a price is equal to what you expect. Example:

In [11]:
print(10 > 5)   # Is 10 greater than 5?
print(5 == 5)   # Is 5 equal to 5?

True
True


3. **Logical Operators:** These are used with booleans (`True` or `False`) to make more complex conditions.

- `and`: Both conditions must be `True`.
- `or`: At least one condition must be `True`.
- `not`: Reverses the boolean value.

**Example:**

In [12]:
is_raining = False
is_sunny = True
print(is_raining and is_sunny)  # Both must be true
print(is_raining or is_sunny)   # At least one is true

False
True


_________________________________________________________________________________________________________________________________________
## <span style="color: #00008B;">📊 Data Types, Data Sources & Data Formats</span> 

Hey there, future Data Scientist! 👋 Today, we’re diving deep into **Data Types**, **Data Sources**, and **Data Formats**—the foundation of every data-driven project. By the end, you'll know **where data comes from, how it’s stored, and how to work with different data structures** in Python. Plus, we’ve got some hands-on practical exercises to sharpen your skills! Ready? Let’s go! 🚀
_________________________________________________________________________________________________________________

### <span style="color: #00008B;">🔹 1. Data Types: The Building Blocks of Data Science</span>

Every dataset is made up of different types of data. Understanding them helps in **data cleaning, transformation, and modeling.**

**📌 Common Data Types in Python**

|Data Type	| Description	| Example|
|:--- |:--- |:--- |
| Integer (`int`)	| Whole numbers	| `42, -7, 1000` |
| Float (`float`)	| Decimal numbers	| `3.14, -2.5, 100.0` |
| String (`str`)	| Textual data	| `"hello", "123", "data science"`|
| Boolean (`bool`) | True/False values | `True, False` |
| List (`list`) | Ordered collection of values | `[1, 2, 3], ["apple", "banana"]` |
| Tuple (`tuple`)	| Immutable collection | `(5, 10, 15)` |
| Dictionary (`dict`) | Key-value pairs | `{"name": "John", "age": 25}` |
| Set (`set`)	| Unique unordered values |	`{1, 2, 3, 3, 4}` |


💡 Pro Tip: Choosing the right data type improves memory efficiency and processing speed.

<span style="color: orange;">**Try it!**</span> 

Try this out:

In [13]:
# What will be the data type of each variable?
x = 10
y = 3.5
z = "Data Science"
a = [1, 2, 3, 4]
b = (10, 20, 30)
c = {"name": "Alice", "age": 25}
d = {1, 2, 2, 3, 4}
e = True

# Print their types
print(type(x))
print(type(y))
print(type(z))
print(type(a))
print(type(b))
print(type(c))
print(type(d))
print(type(e))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'list'>
<class 'tuple'>
<class 'dict'>
<class 'set'>
<class 'bool'>


_______________________________________________________________________________________________________________

### <span style="color: #00008B;"> 🌍 2. Data Sources: Where Does Data Come From?</span> 

Data Science wouldn't exist without data! However before we dive into data source lets explore; 

<span style="color: #00008B;">**📌 Structured vs. Unstructured Data Sources**</span> 

|Type	| Description	| Example|
|:--- |:--- |:--- |
| Structured Data	| Organized in tables/databases	| SQL databases, Spreadsheets |
| Unstructured Data	| No fixed format	| Text, Images, Videos, PDFs |
| Semi-Structured Data	| Partially Structured	| JSON, XML, APIs|

<span style="color: #00008B;">**📌 Common Data Sources**</span> 

Now that we have explored different types of data gathered from the data sources, lets find out where we get our data from; 

- Public datasets 
- APIs
- Webscraping 

Lets expand the data sources in detail; 

- **Public Datasets**

Want free data to practice? Check these out:

* [Kaggle 🏆](https://www.kaggle.com/datasets)
* [Google Dataset Search 🔍](https://archive.ics.uci.edu/datasets/)
* [UCI Machine Learning Repository 🎓](https://datasetsearch.research.google.com/)

- **APIs (Application Programming Interfaces)**

APIs let us fetch live data from platforms like Twitter, OpenWeather, and Google Maps.

<span style="color: orange;">**Try it!:</span> Fetching stock data from AlphaVintage API

In [14]:
# Secure API key 
## export API_KEY="your_api_key_here" # linux
## set API_KEY=your_api_key_here # windows

# Access the API Key 
import os
api_key = os.getenv('API_KEY') # Read the API key from the environment variable

# Fetch the data
import requests

# replace the "with your own API key" apikey below with your own key from https://www.alphavantage.co/support/#api-key
url = f'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=IBM&interval=5min&apikey={api_key}'
r = requests.get(url)
data = r.json()

print(data)

{'Meta Data': {'1. Information': 'Intraday (5min) open, high, low, close prices and volume', '2. Symbol': 'IBM', '3. Last Refreshed': '2025-02-28 19:55:00', '4. Interval': '5min', '5. Output Size': 'Compact', '6. Time Zone': 'US/Eastern'}, 'Time Series (5min)': {'2025-02-28 19:55:00': {'1. open': '252.5000', '2. high': '252.5000', '3. low': '252.0300', '4. close': '252.5000', '5. volume': '46'}, '2025-02-28 19:50:00': {'1. open': '252.5000', '2. high': '252.5000', '3. low': '252.5000', '4. close': '252.5000', '5. volume': '131'}, '2025-02-28 19:45:00': {'1. open': '252.5000', '2. high': '252.5900', '3. low': '252.4300', '4. close': '252.5000', '5. volume': '89'}, '2025-02-28 19:35:00': {'1. open': '252.4000', '2. high': '252.4000', '3. low': '252.4000', '4. close': '252.4000', '5. volume': '2'}, '2025-02-28 19:30:00': {'1. open': '252.0200', '2. high': '252.0200', '3. low': '252.0200', '4. close': '252.0200', '5. volume': '70'}, '2025-02-28 19:25:00': {'1. open': '252.0200', '2. high':

💡 **Pro Tip:** Always check an API’s documentation to understand how to request and process data. For instance the API documentation for AlphaVintage can be found [here](https://www.alphavantage.co/documentation/)

<span style="color: green;">**Practical Exercise**</span>: Fetch Data from a Public API

Modify the above code to fetch data from another API (e.g., CoinGecko API for cryptocurrency prices).

In [15]:
import requests

url = "https://api.coingecko.com/api/v3/simple/price?ids=bitcoin&vs_currencies=usd"

response = requests.get(url)
data = response.json()

print(f"Bitcoin Price: ${data['bitcoin']['usd']}")


Bitcoin Price: $85854


<span style="color: #00008B;">**📌 Data Formats: How is Data Stored?**</span> 
    
|Format	| Description	| Example|
|:--- |:--- |:--- |
| CSV (Comma-Separated Values)	| Plain text format for tabular data	| `data.csv` |
| JSON (JavaScript Object Notation)	| Lightweight data format, used in APIs	| `data.json` |
| XML (eXtensible Markup Language)	| Used for structured data storage	| `data.xml`|
| Parquet | Optimized for big data | data.parquet |

## <span style="color: #00008B;">🚀 Conditional Statements & Loops in Python</span>

## <span style="color: #00008B;">🚀 Functions in Python</span>

## <span style="color: #00008B;">🚀 Python Lambda</span>

## <span style="color: #00008B;">🚀 Error Handling in Python - Try ...Except</span>

_________________________________________________________________________________
## <span style="color: #00008B;">📊 Data collection methods and data ethics 🤖⚖️.</span>

Welcome to one of the most important aspects of Data Science—Data Collection & Ethics! 🏆

Why does it matter? Because garbage in = garbage out! 🚮 If your data is unreliable or collected unethically, no fancy AI model will save you.

<span style="color: #00008B;">**How Do We Collect Data?** </span>

Data doesn’t magically appear—we gather it through different methods. Let’s explore the most common ones:

- Manual data collection --Surveys (Google Forms, Typeform), Interviews, Observations
- Web Scraping - Used to access large data sets from the web but before doing so, read the `robot.txt` file from the internet. 
- APIs
- Public Data sets like Kaggle.com provide freely available data set though it's accuracy is not guaranteed. It is used for learning but not tacking a read-world problem!

<span style="color: #00008B;">**⚖️ Data Ethics: The Rules of the Game**</span>

With great data power comes great responsibility! 🦸‍♂️ Let’s discuss ethical data collection. Here is what you should consider before collecting data; 

- *Informed Consent* - People should know when their data is being collected & how it will be used. No sneaky business! For instance websites ask for cookie permissions before tracking your behavior.
- *Privacy & Anonymity* - Never collect personal data (names, emails, health records) without protecting it. It is advised that you store anonymized customer purchase history. It is unethical to sell personal phone numbers to advertisers. 
- *Avoid Bias & Discrimination** - Collected data should cover diverse backgrounds, this is important when training an AI model to be fair and just when making decisions. 
- *Data Security* - Always encrypt & protect data from hackers.

**Ethical or Not?**

- Company collects user data without consent.
- A researcher anonymizes survey responses before sharing.
- An AI hiring model favors one race/gender over another.
-  A business buys customer emails from a third party.