# Anomaly Detection in Real Time

**Author:** T Mohamed Yaser  

*This notebook is created for the Pathway tutorial series.*



Welcome! This notebook will guide you through real-time anomaly detection using practical datasets and Python code. Let's make machine learning both professional and fun! 🚀

## 🚀 Downloading the Dataset (with Explanation & Fun!)

Before we start detecting anomalies, let's grab our dataset! Below, we use Python's `requests` library to fetch the login data. Each line is explained so you know exactly what's happening. 

> 💡 **Tip:** Want to make your notebook even cooler? Add a GIF showing the download in action! For example:
>
<img src="https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExZ2NhN2U0cTBrMnFqc2ptOXhhb3ZsNTdteTVnbmg2cGR1cTljaTBiYiZlcD12MV9naWZzX3NlYXJjaCZjdD1n/11ASZtb7vdJagM/giphy.gif" alt="Download Animation" style="width:350px; height:350px;">



In [6]:
import requests  # Import the requests library for HTTP operations

url = "https://public-pathway-releases.s3.eu-central-1.amazonaws.com/data/suspicious_users_tutorial_logins.csv"  # The URL of our dataset
response = requests.get(url)  # Send a GET request to download the file
with open("logins.csv", "wb") as f:  # Open a new file in write-binary mode
    f.write(response.content)  # Write the downloaded content to the file

## 📦 Installing Pathway — Your Data Sidekick!

Before we dive into real-time anomaly detection, let's make sure we have our secret weapon: the `pathway` library! This cell installs Pathway straight from PyPI, so you’re always ready for blazing-fast data processing. 🚀

> 🛠️ *Pro tip: If you already have Pathway, this will just update it to the latest version!*
>
> ![Install Animation](https://media.giphy.com/media/26ufnwz3wDUli7GU0/giphy.gif)


In [None]:
%%capture --no-display
!pip install pathway

## 🏗️ Setting Up Pathway & Data Schema

Now, let's import Pathway and define the schema for our login data. This helps Pathway understand the structure of our CSV file. Each field (username, successful, time, ip_address) is described so we can process the data efficiently.


<img src="https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExY3F6M2Y1MWx4ZWp5bXBwMzNpc2E5MzZwOW1nbnprNGtjZTZrazVyeSZlcD12MV9naWZzX3NlYXJjaCZjdD1n/l2JeblbdfRL0i2qOI/giphy.gif" alt="Download Animation" style="width:400px; height:350px;">

- `InputSchema`: Tells Pathway what columns to expect and their types.
- `pw.io.csv.read`: Reads the CSV into a Pathway table for further analysis.

In [None]:
import pathway as pw



class InputSchema(pw.Schema):
    username: str
    successful: str
    time: int
    ip_address: str


logins = pw.io.csv.read(
    "logins.csv",
    schema=InputSchema,
    mode="static",
  )


## 🔄 Converting Login Success to Boolean

Let's make our data easier to work with! Here, we convert the `successful` column from a string ("True"/"False") to a real boolean (`True`/`False`). This helps with filtering and analysis later on.

- `with_columns`: Adds or modifies columns in the Pathway table.
- `pw.this.successful == "True"`: Checks if each login was successful.

<img src="https://media.giphy.com/media/v1.Y2lkPWVjZjA1ZTQ3bWExbDQycWJ2MTNha3IxajBwejRhejQxanN2aTVlbXU0cmR4dDhpcCZlcD12MV9naWZzX3NlYXJjaCZjdD1n/0G6S8B24tygYx27pPS/giphy.gif" alt="Download Animation" style="width:350px; height:350px;">

In [None]:
logins = logins.with_columns(successful=(pw.this.successful == "True"))

## ❌ Filtering Failed Logins

Now, let's focus on the failed login attempts. We filter the data to keep only the rows where the login was not successful. This is crucial for detecting suspicious activity!

- `filter(~pw.this.successful)`: Keeps only failed logins (where successful is False).

> ![Filter Animation](https://media.giphy.com/media/3o6Zt6ML6BklcajjsA/giphy.gif)

In [None]:
failed = logins.filter(~pw.this.successful)

## ⏲️ Counting Failed Logins per IP (per Minute)

Let's see which IP addresses are failing to log in the most! We use a tumbling window to count failed logins per IP every 60 seconds.

- `windowby`: Groups data by time window and IP address.
- `reduce`: Counts the number of failed logins per group.

> ![Window Animation](https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExMDh3cTh5NjkxNjJwanV3cm1xb2E3c3ZjMTA2OWR5NXZ3anM0NGxwNyZlcD12MV9naWZzX3NlYXJjaCZjdD1n/4JVTF9zR9BicshFAb7/giphy.gif)

In [None]:
result = failed.windowby(
    failed.time, window=pw.temporal.tumbling(duration=60), instance=pw.this.ip_address
).reduce(
    ip_address=pw.this._pw_instance,
    count=pw.reducers.count(),
)

In [None]:
suspicious_logins = result.filter(pw.this.count >= 5)
pw.debug.compute_and_print(suspicious_logins)


---

# 🎉 Congratulations!

You've just built a real-time anomaly detector using Pathway and Python! 🚦

Feel free to experiment, tweak, and explore further. Remember, every data adventure makes you a better data scientist. Keep learning and stay curious!

> Made with ❤️ by T Mohamed Yaser with Pathway

![Party Animation](https://media.giphy.com/media/111ebonMs90YLu/giphy.gif)