<a href="https://colab.research.google.com/github/Phishinf/AITeam/blob/main/WifiTracker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Appendix: The Project is based on a device called wi-fi dongle to estimate the number of mobile device with wi-fi function.

## Introduction

### In tracking mobile devices, one of the method is through a wi-fi dongle. A Wi-Fi dongle, also known as a Wi-Fi adapter or USB Wi-Fi adapter, is primarily designed to allow a computer to connect to Wi-Fi networks and access the internet wirelessly. It does this by receiving and transmitting Wi-Fi signals between the computer and nearby Wi-Fi access points.

### While a Wi-Fi dongle can detect and interact with nearby Wi-Fi devices, it typically does not have the capability to directly record the number of mobile devices around it. The functionality of a Wi-Fi dongle is focused on networking and internet connectivity for the host device (e.g., a computer, laptop, or tablet) rather than monitoring or tracking nearby devices.

### To record the number of mobile devices around a specific location, specialized equipment and software are usually required. For example:

#### 1.	Wi-Fi Access Points: Wi-Fi routers and access points can log information about devices that connect to them, including the number of connected devices. However, this is limited to devices actively connecting to the specific Wi-Fi network managed by the access point.

#### 2.	Wireless Network Monitoring Tools: Dedicated network monitoring tools can provide insights into nearby Wi-Fi networks, devices, and traffic. These tools may offer features to analyze network activity, detect devices, and monitor signal strength, but they typically require specific hardware and software configurations.

#### 3.	Location-Based Services (LBS): Some systems, such as location-based services used in retail or public spaces, can track the presence of mobile devices using technologies like Wi-Fi, Bluetooth, or RFID. These systems often combine hardware sensors with software platforms to monitor and analyse device presence and movements.

### In summary, while a Wi-Fi dongle is essential for wireless internet connectivity on a computer, it is not typically designed to record or track the number of mobile devices around it. However, with specialised software, it is possible to estimate the number of mobile devices around the perimeter.



### To monitor and track the number of mobile devices around a specific location using Wi-Fi signals, It needs specialized software tools designed for wireless network monitoring and analysis.

### Types of Wireless Network Monitoring Tools:

####  1.	Kismet: Kismet is a popular open-source wireless network detector, sniffer, and intrusion detection system. It can detect and track Wi-Fi networks, devices, and their activities.

####  2.	Wireshark: While primarily a packet analyser, Wireshark can be used to capture and analyse Wi-Fi traffic, including the presence and activities of nearby devices.

#### 3.	NetSpot: NetSpot is a Wi-Fi analyser and survey tool that can scan and visualize Wi-Fi networks, including nearby devices and their signal strengths.

#### 4.	Acrylic Wi-Fi: Acrylic Wi-Fi is a Wi-Fi analyser and scanning tool that can detect nearby Wi-Fi networks, devices, and signal strengths. It offers features for network troubleshooting and analysis.

### These software tools vary in their features, capabilities, and compatibility with different Wi-Fi hardware. Depending on your specific requirements, such as the size of the area to monitor, the level of detail needed, and the types of devices to track. Additionally, some software may require compatible hardware or access to specific Wi-Fi infrastructure for optimal functionality.

## PCAP Data

### Pcap (packet capture) data is a common and valuable form of collectible data in network analysis through dongle. Pcap files contain recorded network traffic, including packet headers and payloads, captured from a network interface. They are widely used by network administrators, security professionals, researchers, and analysts for various purposes, including:

#### Network Troubleshooting: Pcap data allows analysts to examine network traffic to diagnose and troubleshoot network issues such as congestion, packet loss, and latency.

#### Security Analysis: Security analysts use pcap data to investigate security incidents, detect intrusions, and analyse malicious activities such as malware infections, phishing attacks, and network reconnaissance.

#### Protocol Analysis: Pcap data provides insights into how network protocols are used and can help in protocol debugging, performance optimization, and protocol compliance analysis.

#### Forensic Analysis: Pcap data can be used in digital forensics investigations to reconstruct network communications, identify suspicious behaviour, and gather evidence for legal proceedings.

#### Traffic Monitoring and Analysis: Pcap data allows for the monitoring and analysis of network traffic patterns, trends, and usage statistics, which can be valuable for capacity planning, network optimization, and policy enforcement.

### It is known that tools like Wireshark, tcpdump, and tshark are commonly used to capture, analyse, and manipulate pcap data. Overall, pcap data serves as a fundamental source of information for understanding and analysing network behaviour and is an essential component of network analysis workflows.

## Estimation of Number of Mobile Devices Around using Pcap data

#### To estimate the number of mobile devices using Wireshark with pcap data using following steps, one can leverage Wireshark's filtering and analysis capabilities,

#### Capture Packets: Start Wireshark and capture packets on the network interface where mobile devices are connected.

#### Filter Mobile Device Traffic: Apply a display filter to focus on packets originating from or destined to mobile devices. This could include filtering by MAC address, IP address range, or specific protocols commonly used by mobile devices (e.g., HTTP, HTTPS, DNS).

#### Analyse Unique Identifiers: Look for unique identifiers such as MAC addresses, IP addresses, or device fingerprints to identify individual mobile devices.

#### Track Connections: Monitor connections established by mobile devices to servers or other devices on the network. You can use Wireshark's statistics features to analyse connection statistics and identify unique device connections.

#### Aggregate Data: Aggregate the data by counting unique MAC addresses, IP addresses, or device fingerprints to estimate the number of distinct mobile devices observed in the pcap data.

#### Consider DHCP Traffic: Analyse DHCP (Dynamic Host Configuration Protocol) traffic to identify IP addresses assigned to mobile devices dynamically. This can help in counting the number of active devices on the network.






## Program Code to Convert Pcap Data

```
# This is formatted as code
```



### Install libaries


In [None]:
!pip install scapy

In [None]:
## A simple Python script to read a PCAP file and print out some basic information about the packets:
from scapy.all import *

def analyze_pcap(pcap_file):
    packets = rdpcap(pcap_file)  # Read PCAP file

    print("Total packets in the PCAP file:", len(packets))

    # Analyze each packet
    for packet in packets:
        print("\nPacket summary:")
        print(packet.summary())  # Print summary of packet

        # You can add more analysis based on your requirements
        # For example:
        # - Extract source and destination IP addresses
        # - Extract protocols used
        # - Extract payload and analyze its contents

pcap_file = "your_pcap_file.pcap"  # Provide path to your PCAP file
analyze_pcap(pcap_file)


### Replace "your_pcap_file.pcap" with the path to actual PCAP file. This script will print out the total number of packets in the PCAP file and then print a summary of each packet.

### Customize the analyze_pcap function to perform more specific analysis based on your requirements. For example, extract specific fields from each packet and perform more detailed analysis on them.

### Remember to have appropriate permissions to read the PCAP file and ensure that the file path is correct. Additionally, keep in mind that working with large PCAP files can consume a significant amount of memory, so it's important to optimize the analysis code accordingly.

### To eliminate repeat records while analyzing packets using Scapy, you can maintain a set of unique identifiers for packets you've already processed. Here's how you can modify the script to achieve this:

In [None]:
def analyze_pcap(pcap_file):
    packets = rdpcap(pcap_file)  # Read PCAP file

    print("Total packets in the PCAP file:", len(packets))

    # Set to store unique packet identifiers
    unique_packets = set()

    # Analyze each packet
    for packet in packets:
        # Generate a unique identifier for the packet
        packet_id = hash(packet)

        # Check if the packet is unique
        if packet_id not in unique_packets:
            print("\nPacket summary:")
            print(packet.summary())  # Print summary of packet

            # Add packet identifier to set of unique packets
            unique_packets.add(packet_id)

            # You can add more analysis based on your requirements
            # For example:
            # - Extract source and destination IP addresses
            # - Extract protocols used
            # - Extract payload and analyze its contents

pcap_file = "your_pcap_file.pcap"  # Provide path to your PCAP file
analyze_pcap(pcap_file)


### In above modified version of the script, before printing the summary of each packet, it checks if the hash of the packet is already in the set of unique packet identifiers (unique_packets). If it's not present, it prints the packet summary, adds the hash to the set of unique packet identifiers, and proceeds to the next packet. This way, only unique packets will be processed and repeated records will be eliminated from the analysis.sed text

## Extract Mobile Device Informaton from Pcap

In [None]:
from scapy.all import *

def is_mobile(packet):
    # Check for known mobile device MAC address prefixes
    mobile_mac_prefixes = ["XX:XX:XX", "YY:YY:YY"]  # Add more prefixes as needed

    # Check for mobile device MAC address
    if packet.haslayer(Ether):
        src_mac = packet[Ether].src
        if any(src_mac.startswith(prefix) for prefix in mobile_mac_prefixes):
            return True

    # Check for mobile device IP address
    if packet.haslayer(IP):
        src_ip = packet[IP].src
        # Assuming mobile devices have IP addresses in certain ranges
        if src_ip.startswith("192.168.1.") or src_ip.startswith("10.0.0."):
            return True

    # Check for User-Agent string in HTTP packets
    if packet.haslayer(HTTP):
        user_agent = str(packet[HTTP].payload)
        if "Mobile" in user_agent:
            return True

    # Add more checks as needed

    return False

def analyze_pcap(pcap_file):
    packets = rdpcap(pcap_file)  # Read PCAP file

    print("Total packets in the PCAP file:", len(packets))

    # Analyze each packet
    for packet in packets:
        # Check if packet is from a mobile device
        if is_mobile(packet):
            print("\nPacket from a mobile device:")
            print(packet.summary())  # Print summary of packet
            # Add more analysis specific to mobile devices if needed
        else:
            # Handle packets not from mobile devices
            pass

        # You can add more analysis based on your requirements
        # For example:
        # - Extract source and destination IP addresses
        # - Extract protocols used
        # - Extract payload and analyze its contents


pcap_file = "your_pcap_file.pcap"  # Provide path to your PCAP file
analyze_pcap(pcap_file)


### To distinguish packets from mobile devices using Scapy, you typically look at characteristics such as the MAC addresses, IP addresses, User-Agent strings, and sometimes specific protocols associated with mobile devices.

###Above script defines a function is_mobile() that checks for characteristics typically associated with mobile devices, such as MAC addresses, IP addresses, and User-Agent strings. The analyze_pcap() function then utilizes this function to filter packets and print information about packets originating from mobile devices.

### Customize the is_mobile() function further based on the knowledge of the network and the specific characteristics of mobile devices been interested in. Additionally, one can extend the analysis to include more features or characteristics if needed.



## Predestrian Analysis

#### Analyzing pedestrian behavior, in order to estimation the extent of explsoure of the advertisement in postbox, using PCAP data involves extracting relevant information from network traffic to infer pedestrian movement or activity. This could include tracking Wi-Fi signals from mobile devices, analyzing HTTP requests from mobile apps, or even monitoring Bluetooth signals if applicable.

### *A generalized approach to using PCAP data for pedestrian analysis:*

##### 1. Collect PCAP Data: Capture network traffic in the area where pedestrian activity is to be analyzed. You can use tools like Wireshark or tcpdump to capture PCAP data

##### 2. Extract Relevant Packets: Filter out packets that are likely to be associated with pedestrian devices. This could include Wi-Fi probe requests or Bluetooth advertisements from mobile phones or other wearable devices.

##### 3. Identify Pedestrian Devices: Analyze MAC addresses, IP addresses, or other identifiers to distinguish devices likely carried by pedestrians from other network devices.

##### 4. Track Movement: Analyze patterns in device activity to infer pedestrian movement. This might involve tracking the appearance of devices in different areas or correlating movement with other events (e.g., HTTP requests for location-based services).

##### 5. Behavior Analysis: Once pedestrian devices are identified and tracked, analyze their behavior. This could include dwell times in specific locations, routes taken, frequency of appearance, etc.

##### 6. Visualize Results: Present the analysis results visually, such as on a map showing pedestrian paths or in a chart indicating pedestrian activity over time.

#### In follwoing a Python script using Scapy to illustrate a basic example of analyzing Wi-Fi probe requests for pedestrian analysis:

In [None]:
from scapy.all import *

def analyze_pcap_for_pedestrians(pcap_file):
    packets = rdpcap(pcap_file)  # Read PCAP file

    # Set to store unique MAC addresses (pedestrian devices)
    pedestrian_devices = set()

    # Analyze each packet
    for packet in packets:
        # Check if the packet is a Wi-Fi probe request
        if packet.haslayer(Dot11ProbeReq):
            # Extract MAC address of the device
            mac_address = packet.addr2
            # Add the MAC address to the set of pedestrian devices
            pedestrian_devices.add(mac_address)

    # Print the identified pedestrian devices
    print("Pedestrian devices identified:")
    for device in pedestrian_devices:
        print(device)

if __name__ == "__main__":
    pcap_file = "your_pcap_file.pcap"  # Provide path to your PCAP file
    analyze_pcap_for_pedestrians(pcap_file)


## Some Stanedalone Device Capture Pcap date Transform to TXT File

#### PCAP (Packet Capture) data is typically captured by network monitoring tools such as Wireshark, tcpdump, or various other network sniffers. These tools allow users to capture network traffic on a specific network interface. The captured data includes information about each packet transmitted over the network, including source and destination MAC addresses, IP addresses, port numbers, packet contents, timestamps, and more.

#### Once captured, PCAP data can be converted to a human-readable format, such as a text file, for further analysis or processing. The format of each observation consists of:

#### - Source MAC Address
#### - Destination MAC Address
#### - Field 3
#### - Field 4
#### - Field 5
#### - Signal Strength (assumed to be in dBm)
#### - Timestamp

#### Python has libraries like scapy that can handle PCAP files efficiently. After parsing the PCAP data, you can extract relevant information from each packet and format it as you desire, such as the format you provided.

In [None]:
# CONVERT THE PCAP TO TXT FOR ANALYSIS

from scapy.all import *

def pcap_to_txt(pcap_file):
    observations = []
    packets = rdpcap(pcap_file)  # Read the pcap file

    for packet in packets:
        if packet.haslayer(Dot11):  # Assuming Wi-Fi traffic
            src_mac = packet.addr2
            dst_mac = packet.addr1
            signal_strength = packet.dBm_AntSignal  # Signal strength in dBm
            timestamp = packet.time.strftime("%Y-%m-%d %H:%M:%S")  # Timestamp
            observation = f"{src_mac}|{dst_mac}|00|05|09|{signal_strength}|{timestamp}"
            observations.append(observation)

    with open("output.txt", "w") as f:
        for obs in observations:
            f.write(obs + "\n")

# Usage example
pcap_to_txt("input.pcap")


## Python to Convert TxT to Varaiables for Analysis                                                                                        
#### Use the following Python code to extract the relevant information from each observation in the text file while ignoring fields 3, 4, and 5, and parsing the timestamp to extract the year, month, date, weekday, and hour


In [None]:
from datetime import datetime

def parse_observation(observation):
    fields = observation.split("|")
    src_mac = fields[0]
    dst_mac = fields[1]
    # free feel to add channel variable, likely the fields[5]
    signal_strength = int(fields[6])
    timestamp = datetime.strptime(fields[7].strip(), "%Y-%m-%d %H:%M:%S")
    year = timestamp.year
    month = timestamp.month
    date = timestamp.day
    weekday = timestamp.strftime("%A")
    hour = timestamp.hour
    return (src_mac, dst_mac, signal_strength, year, month, date, weekday, hour)

# Read observations from the file
observations = []
with open("your_file.txt", "r") as file:
    for line in file:
        observations.append(line)

# Extract information from each observation
for observation in observations:
    info = parse_observation(observation)
    print("Source MAC Address:", info[0])
    print("Destination MAC Address:", info[1])
    print("Signal Strength:", info[2])
    print("Year:", info[3])
    print("Month:", info[4])
    print("Date:", info[5])
    print("Weekday:", info[6])
    print("Hour:", info[7])




#### The following code will read the observations from the text file specified, aggregate the number of distinct source MAC addresses for each hour, and store the result in a DataFrame, use Python with the pandas library.  Make sure to replace "observations.txt" with the path to the actual text file.

In [None]:
import pandas as pd
from datetime import datetime

# Read the text file into a pandas DataFrame
df = pd.read_csv("your_file.txt", sep="|", header=None)
df.columns = ["Source_MAC", "Destination_MAC", "Field_3", "Field_4", "Field_5", "Signal_Strength", "Timestamp"]

# Convert the Timestamp column to datetime objects
df["Timestamp"] = pd.to_datetime(df["Timestamp"])

# Group by hour and count the number of distinct source MAC addresses
df["Hour"] = df["Timestamp"].dt.hour
hourly_mac_counts = df.groupby(["Hour"])["Source_MAC"].nunique().reset_index()

# Save the result to a new DataFrame file
hourly_mac_counts.to_csv("hourly_mac_counts.csv", index=False)


## Regression Model

#### Dependent Variable = hourly_mac_counts

#### Indpendent Variables include:

##### - Month (March, ...)
##### - Weekday (Monday, ..., Sunday)
##### - District (18 District Name)
##### - Type of Region (Resident, Commerce, Industry)
##### - Nearby_Resturant (Number of Resturants)
##### - School (Yes or No)
##### - Publc_Services (Yes or No)
##### - Mall (Yes or No)
##### - Bus_Stops (Yes or No)
##### - MITStation (Yes or No)

#### Creates sample data for the independent variables and then combines them with the hourly_mac_counts DataFrame. Finally, it saves the combined DataFrame to a CSV file called "combined_data.csv".

#### Replace the sample data in the data dictionary with your actual data for each independent variable. Run following code, one will have a DataFrame containing both the dependent variable hourly_mac_counts and the independent variables.

In [None]:
import pandas as pd
from datetime import datetime

# Assuming you have already loaded and processed the hourly_mac_counts DataFrame

# Create sample data for independent variables
data = {
    "Month": ["March", "March", "March", "March", "March"],
    "Weekday": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
    "District": ["District A", "District B", "District C", "District D", "District E"],
    "Type_of_Region": ["Residential", "Commerce", "Industry", "Residential", "Commerce"],
    "Nearby_Restaurant": [10, 5, 3, 8, 6],
    "School": ["Yes", "No", "Yes", "No", "Yes"],
    "Public_Services": ["Yes", "Yes", "No", "Yes", "Yes"],
    "Mall": ["No", "Yes", "No", "No", "Yes"],
    "Bus_Stops": ["Yes", "No", "Yes", "Yes", "No"],
    "MIT_Station": ["Yes", "No", "Yes", "Yes", "No"]
}

# Create DataFrame for independent variables
independent_df = pd.DataFrame(data)

# Combine independent variables DataFrame with hourly_mac_counts DataFrame
combined_df = pd.concat([hourly_mac_counts, independent_df], axis=1)

# Save the combined DataFrame to a CSV file
combined_df.to_csv("combined_data.csv", index=False)



## Time Series and Sesaonal Features in Addition to Cross-sectional Analysis

#### Considering seasonal patterns and time series aspects in pedestrian analysis can provide valuable insights into pedestrian behavior, such as understanding how pedestrian activity varies over different times of the day, days of the week, or across seasons. Here's how you can incorporate seasonal patterns and time series considerations into pedestrian analysis:

##### 1. Data Collection: Collect pedestrian data over an extended period, capturing timestamps in pcap data along with pedestrian activity.

##### 2. Data Preprocessing: Preprocess the collected data, ensuring consistency in timestamps and handling missing or erroneous data points.

#### *Time Series Analysis:*

Seasonal Decomposition: Decompose the time series data into its seasonal, trend, and residual components using methods like seasonal decomposition of time series (STL) or classical decomposition techniques.

Seasonal Index Calculation: Calculate seasonal indices to quantify the magnitude of seasonal fluctuations in pedestrian activity. This allows you to compare activity levels across different seasons.

Time Series Visualization: Visualize the time series data to identify patterns, trends, and seasonal variations. This could include plotting pedestrian counts over time, creating heatmaps of activity levels by hour or day, or using other visualization techniques to highlight seasonal patterns.
Statistical Analysis:

Seasonal Analysis: Conduct statistical tests to determine if there are significant differences in pedestrian activity across seasons.
Time Series Forecasting: Use time series forecasting methods to predict future pedestrian activity based on historical patterns. This can help in resource allocation and planning for pedestrian infrastructure.

#### *Feature Engineering:*

Temporal Features: Engineer features such as time of day, day of the week, month, or season to capture temporal patterns in pedestrian behavior.
Holiday and Event Indicators: Incorporate indicators for holidays, special events, or weather conditions that may influence pedestrian activity.
Machine Learning Models:

Seasonal Regression Models: Build regression models that incorporate seasonal predictors to estimate pedestrian counts.

Time Series Forecasting Models: Train time series forecasting models such as ARIMA, SARIMA, or LSTM networks to predict pedestrian activity based on historical data.

Anomaly Detection: Develop anomaly detection models to identify unusual spikes or dips in pedestrian activity that deviate from expected seasonal patterns.
Evaluation and Interpretation:

Evaluate the performance of your models using metrics such as mean absolute error (MAE), root mean squared error (RMSE), or accuracy.
Interpret the results of your analysis to derive actionable insights for urban planning, transportation management, or pedestrian safety initiatives.
By incorporating seasonal patterns and time series considerations into your pedestrian analysis, you can gain a deeper understanding of pedestrian behavior and make more informed decisions to optimize pedestrian infrastructure and enhance urban livability.