# <center><font color = '#DF9166' size = 20 center> **Dataset Overview**</font></center>



This dataset provides a detailed view of telecommunications network performance, user behavior, and device activity. Each record captures session-specific metrics, including identifiers, throughput, latency, data usage, and device details. These attributes offer valuable insights into user interactions with the network, highlighting performance trends and application usage patterns.

In the telecommunications sector, optimizing network performance and enhancing user experience are critical. By analyzing metrics like latency, throughput, and retransmission rates, alongside application-specific data, this dataset enables identifying network bottlenecks, improving resource allocation, and understanding user preferences. The column descriptions below provide a comprehensive breakdown of these attributes, forming a foundation for data-driven problem-solving.

Here’s a consolidated, in-depth breakdown of the properties, including a conceptual description for each category, an intuitive understanding, and their implications in the telecommunications sector:

### <font color = '#DF9166' size=6>**1. Session Identifiers and Metadata**</font>

- **Conceptual Description**:  
In telecommunications, a **bearer** is a logical connection responsible for transmitting user data during a session, such as streaming video or browsing the internet. Each session is uniquely identified by a **bearer id**, allowing the network to track and manage user activity. **Timestamps** (Start, End) and their millisecond offsets (Start ms, End ms) precisely log the session's beginning and end. **Duration**, measured in milliseconds or seconds, indicates how long the session lasted.

- **Columns and Descriptions**:  
  - **`bearer id`**: Unique identifier for each session, used for tracking and analysis.  
  - **`Start`**: Timestamp marking when the session began.  
  - **`Start ms`**: Millisecond offset for the start time, adding precision.  
  - **`End`**: Timestamp marking when the session concluded.  
  - **`End ms`**: Millisecond offset for the end time, adding precision.  
  - **`Dur. (ms)`**: Total session duration in milliseconds.  
  - **`Dur. (s)`**: Total session duration in seconds.  


- **Implications**:  
These metrics are fundamental for understanding and managing network activity. Telecommunications providers can:  
  - Track and analyze session timing and duration for resource allocation.  
  - Diagnose issues related to abnormal session behavior (e.g., unusually short or extended durations).  
  - Maintain efficient session-level network operations.

### <font color = '#DF9166' size=6>**2. Device and User Information**</font>


- **Conceptual Description**:  
This category provides insights into the users and devices interacting with the network. The **IMSI (International Mobile Subscriber Identity)** is a unique number that identifies each subscriber within the mobile network. The **MSISDN (Mobile Subscriber ISDN Number)**, often referred to as the user’s phone number, is the globally recognized identifier for communication. The **IMEI (International Mobile Equipment Identity)** is a unique identifier for the device hardware, enabling device-level tracking. **Handset Manufacturer** and **Handset Type** specify the brand and model of the mobile device, providing insights into the distribution of device types across the network. **Last Location Name** refers to the user’s geographic location at the session’s end, such as the serving cell tower or region.

- **Columns and Descriptions**:  
  - **`IMSI`**: Unique identifier for mobile subscribers, essential for session tracking.  
  - **`MSISDN/Number`**: The subscriber’s phone number in international format, used for global identification.  
  - **`IMEI`**: Unique hardware identifier of the user’s device, aiding in device-specific analysis.  
  - **`Handset Manufacturer`**: The brand of the mobile device (e.g., Apple, Samsung).  
  - **`Handset Type`**: Specific model or type of the device (e.g., iPhone 14, Galaxy S21).  
  - **`Last Location Name`**: Geographic location of the user’s device at the session’s end, used for coverage and location analysis.  

- **Implications**:  
These columns are critical for network management and user analysis. They enable:  
  - Tracking users and devices across sessions.  
  - Monitoring the diversity and capabilities of devices for compatibility optimization.  
  - Understanding user locations for location-based services and identifying regions with network performance or coverage issues.  

### <font color = '#DF9166' size=6>**3. Throughput and Latency Metrics**</font>


- **Conceptual Description**:  
This category measures the efficiency and responsiveness of the network. **Throughput** refers to the rate of data transfer and is categorized into **downlink (DL)**, the data received by the user, and **uplink (UL)**, the data sent from the user. High throughput indicates a fast and efficient network. **Latency**, measured as **Round Trip Time (RTT)**, represents the delay in data transmission. It is the time taken for a data packet to travel from the user to the network and back. Lower latency is crucial for applications like video calls, gaming, and real-time services. 
  
  By combining throughput and latency metrics, telecommunications providers can monitor and optimize network performance, ensuring users experience fast and reliable connections.

- **Columns and Descriptions**:  
  - **`Avg RTT DL (ms)`**: Average round-trip time for data in the downlink direction, measuring download latency. Low values indicate better performance.  
  - **`Avg RTT UL (ms)`**: Average round-trip time for data in the uplink direction, measuring upload latency.  
  - **`Avg Bearer TP DL (kbps)`**: Average throughput in the downlink direction, reflecting download speeds. Higher values indicate faster downloads.  
  - **`Avg Bearer TP UL (kbps)`**: Average throughput in the uplink direction, showing upload speeds.  

- **Implications**:  
These metrics provide a detailed picture of network quality and are used to:  
  - Assess speed (throughput) and responsiveness (latency) of the network.  
  - Diagnose issues such as congestion or bottlenecks.  
  - Ensure quality of service for latency-sensitive applications. 


### <font color = '#DF9166' size=6>**4. Data Volume Metrics**</font>


- **Conceptual Description**:  
This category measures the amount of data transferred during a session, including total and application-specific data usage. **Data Volume** captures the total data a user receives (**downlink**) or sends (**uplink**) and provides granular insights into application-specific consumption. Metrics for services like YouTube, Netflix, and social media allow for an understanding of traffic patterns and user behavior. 

  By providing both aggregate and specific metrics, this data offers a comprehensive view of user activity and helps in delivering a tailored, efficient network experience.

- **Implications**:  
These metrics are crucial for:  
  - **Billing**: Ensuring accurate usage-based charges for users.  
  - **Resource Allocation**: Identifying high-demand applications and optimizing network resources accordingly.  
  - **User Behavior Analysis**: Understanding app usage trends and preferences to improve service offerings.  
  - **Network Optimization**: Monitoring traffic by application to manage congestion and enhance performance.

- **Columns and Descriptions**:  
  - **`Total DL (Bytes)`**: Total data received by the user during the session. Reflects overall download consumption.  
  - **`Total UL (Bytes)`**: Total data sent by the user during the session. Indicates overall upload activity.  
  - **`HTTP DL (Bytes)`**: Volume of data received over HTTP protocols. Highlights browsing and web-related activity.  
  - **`HTTP UL (Bytes)`**: Volume of data sent over HTTP protocols. Represents uploads through web-based platforms.  
  - **Application-Specific Columns**:  
    - **`YouTube DL/UL (Bytes)`**: Data sent and received through YouTube, providing insights into video consumption.  
    - **`Netflix DL/UL (Bytes)`**: Data transferred while using Netflix, showing streaming activity.  
    - **`Social Media DL/UL (Bytes)`**: Data for platforms like Facebook, Instagram, etc., indicating social media usage.  
    - **`Google DL/UL (Bytes)`**: Data transferred via Google services (e.g., search, maps, cloud).  
    - **`Email DL/UL (Bytes)`**: Data sent and received through email applications.  
    - **`Gaming DL/UL (Bytes)`**: Data associated with online gaming sessions.  
    - **`Other DL/UL (Bytes)`**: Data that doesn't fall into predefined application categories.

### <font color = '#DF9166' size=6>**5. Activity Durations**</font>

- **Conceptual Description**:  
**Activity Duration** measures the time spent actively transmitting or receiving data during a session, excluding idle periods where no data is transmitted for 500 ms or more. This provides a focused view of meaningful network usage.  

- **Implications**:  
  - Reflects session efficiency and user engagement.  
  - Highlights actual usage patterns for optimizing network resources.  

- **Column Descriptions**:  

    - **`Activity Duration DL (ms)`**: Time spent actively downloading data during the session, excluding idle periods exceeding 500 ms.  

    - **`Activity Duration UL (ms)`**: Time spent actively uploading data during the session, excluding idle periods exceeding 500 ms.  


### <font color = '#DF9166' size=6>**6. Packet and Retransmission Metrics**</font>


- **Conceptual Description**:  
Packet and retransmission metrics provide insights into network reliability and  -  **TCP Retransmissions** measure the volume of data resent due to lost or corrupted packets, often caused by congestion or poor signal  -  **Throughput Ranges** categorize network speeds into performance brackets, indicating the percentage of session time spent at various throughput levels.

- **Implications**:  
  - **Network Reliability**: High retransmission rates signal potential issues like congestion or packet loss.  
  - **Quality of Service (QoS)**: Throughput range analysis helps assess service consistency and identify areas for optimization.  
  - **Performance Monitoring**: Identifies underperforming sessions or areas needing improved capacity.


- **Column Descriptions**:  

   -  **`TCP DL Retrans. Vol (Bytes)`**: Volume of retransmitted packets in the downlink direction which indicates how often packets are resent to the user, highlighting potential reliability issues in downloading data.

   -  **`TCP UL Retrans. Vol (Bytes)`**: Volume of retransmitted packets in the uplink direction which eflects how often the user’s device had to resend data, pointing to potential issues with uploading.

   -  **`DL TP < 50 Kbps (%)`**: Percentage of session time with downlink throughput below 50 kbps which indicates time spent at very low download speeds, highlighting poor user experience during these periods.

   -  **`50 Kbps < DL TP < 250 Kbps (%)`**: Percentage of session time with downlink throughput between 50 kbps and 250 kbps which reflects moderate download speeds, providing insights into bandwidth constraints.

   -  **`250 Kbps < DL TP < 1 Mbps (%)`**: Percentage of session time with downlink throughput between 250 kbps and 1 Mbps which represents acceptable speeds for basic usage, offering a benchmark for network performance.

   -  **`DL TP > 1 Mbps (%)`**: Percentage of session time with downlink throughput exceeding 1 Mbps which indicates high-speed download performance, reflecting a good user experience.

   -  **`UL TP < 10 Kbps (%)`**: Percentage of session time with uplink throughput below 10 kbps which highlights extremely low upload speeds, often indicating severe performance issues.

   -  **`10 Kbps < UL TP < 50 Kbps (%)`**: Percentage of session time with uplink throughput between 10 kbps and 50 kbps which reflects moderate upload speeds, useful for understanding uplink limitations.

   -  **`50 Kbps < UL TP < 300 Kbps (%)`**: Percentage of session time with uplink throughput between 50 kbps and 300 kbps which indicates reasonable upload speeds for most standard tasks.

   -  **`UL TP > 300 Kbps (%)`**: Percentage of session time with uplink throughput exceeding 300 kbps which reflects high-speed uploads, suitable for demanding tasks like live streaming or large file transfers.

### <font color = '#DF9166' size=6>**7. Data Volume Segmentation**</font>


- **Conceptual Description**:  
Data volume segmentation categorizes usage into specific thresholds, measuring the time users spend within each range for both uplink and downlink data. This provides a detailed understanding of consumption patterns across sessions.

- **Implications**:  
  - Identifies light or heavy data users.  
  - Supports resource allocation by highlighting usage trends.  
  - Detects unusual or anomalous user behavior.  

- **Column Descriptions**:  

    - **`Nb of sec with Vol DL < 6250B`**: Seconds during which the downlink data volume was less than 6,250 bytes, reflecting minimal data usage.

    - **`Nb of sec with 6250B < Vol DL < 31250B`**: Seconds during which the downlink data volume ranged between 6,250 and 31,250 bytes, indicating low to moderate download activity.

    - **`Nb of sec with 31250B < Vol DL < 125000B`**: Seconds during which the downlink data volume ranged between 31,250 and 125,000 bytes, representing moderate download usage.

    - **`Nb of sec with 125000B < Vol DL`**: Seconds during which the downlink data volume exceeded 125,000 bytes, indicating heavy download activity.

    - **`Nb of sec with Vol UL < 1250B`**: Seconds during which the uplink data volume was less than 1,250 bytes, reflecting minimal upload activity.

    - **`Nb of sec with 1250B < Vol UL < 6250B`**: Seconds during which the uplink data volume ranged between 1,250 and 6,250 bytes, showing light upload activity.

    - **`Nb of sec with 6250B < Vol UL < 37500B`**: Seconds during which the uplink data volume ranged between 6,250 and 37,500 bytes, indicating moderate upload usage.

    - **`Nb of sec with 37500B < Vol UL`**: Seconds during which the uplink data volume exceeded 37,500 bytes, reflecting heavy upload activity.

### <font color = '#DF9166' size=6>**Field Discription**</font>


| **Field**                           | **Description**                                                                               |
|-------------------------------------|-----------------------------------------------------------------------------------------------|
| `bearer id`                       | xDr session identifier                                                                         |
| `Dur. (ms)`                       | Total Duration of the xDR (in ms)                                                             |
| `Start`                           | Start time of the xDR (first frame timestamp)                                                 |
| `Start ms`                        | Milliseconds offset of start time for the xDR (first frame timestamp)                         |
| `End`                             | End time of the xDR (last frame timestamp)                                                    |
| `End ms`                          | Milliseconds offset of end time of the xDR (last frame timestamp)                             |
| `Dur. (s)`                        | Total Duration of the xDR (in s)                                                              |
| `IMSI`                            | International Mobile Subscriber Identity                                                      |
| `MSISDN/Number`                   | MS International PSTN/ISDN Number of mobile - customer number                                 |
| `IMEI`                            | International Mobile Equipment Identity                                                       |
| `Last Location Name`              | User location call name (2G/3G/4G) at the end of the bearer                                   |
| `Avg RTT DL (ms)`                 | Average Round Trip Time measurement Downlink direction (msecond)                              |
| `Avg RTT UL (ms)`                 | Average Round Trip Time measurement Uplink direction (msecond)                                |
| `Avg Bearer TP DL (kbps)`         | Average Bearer Throughput for Downlink (kbps) - based on BDR duration                         |
| `Avg Bearer TP UL (kbps)`         | Average Bearer Throughput for Uplink (kbps) - based on BDR duration                           |
| `TCP DL Retrans. Vol (Bytes)`     | TCP volume of Downlink packets detected as retransmitted (bytes)                              |
| `TCP UL Retrans. Vol (Bytes)`     | TCP volume of Uplink packets detected as retransmitted (bytes)                                |
| `DL TP < 50 Kbps (%)`             | Duration ratio when Bearer Downlink Throughput <                                              |
| `50 Kbps < DL TP < 250 Kbps (%)`  | Duration ratio when Bearer Downlink Throughput range is …                                     |
| `250 Kbps < DL TP < 1 Mbps (%)`   | Duration ratio when Bearer Downlink Throughput range is …                                     |
| `DL TP > 1 Mbps (%)`              | Duration ratio when Bearer Downlink Throughput >                                              |
| `UL TP < 10 Kbps (%)`             | Duration ratio when Bearer Uplink Throughput <                                                |
| `10 Kbps < UL TP < 50 Kbps (%)`   | Duration ratio when Bearer Uplink Throughput range is …                                       |
| `50 Kbps < UL TP < 300 Kbps (%)`  | Duration ratio when Bearer Uplink Throughput range is …                                       |
| `UL TP > 300 Kbps (%)`            | Duration ratio when Bearer Uplink Throughput >                                                |
| `HTTP DL (Bytes)`                 | HTTP data volume (in Bytes) received by the MS during this session                            |
| `HTTP UL (Bytes)`                 | HTTP data volume (in Bytes) sent by the MS during this session                                |
| `Activity Duration DL (ms)`       | Activity Duration for Downlink (ms) - excluding periods of inactivity > 500 ms                |
| `Activity Duration UL (ms)`       | Activity Duration for Uplink (ms) - excluding periods of inactivity > 500 ms                  |
| `Dur. (ms).1`                     | Total Duration of the xDR (in ms)                                                             |
| `Handset Manufacturer`            | Handset manufacturer                                                                          |
| `Handset Type`                    | Handset type of the mobile device                                                             |
| `Nb of sec with 125000B < Vol DL` | Number of seconds with IP Volume DL >                                                        |
| `Nb of sec with 1250B < Vol UL < 6250B` | Number of seconds with IP Volume UL between …                                           |
| `Nb of sec with 31250B < Vol DL < 125000B` | Number of seconds with IP Volume DL between …                                         |
| `Nb of sec with 37500B < Vol UL`  | Number of seconds with IP Volume UL >                                                        |
| `Nb of sec with 6250B < Vol DL < 31250B` | Number of seconds with IP Volume DL between …                                         |
| `Nb of sec with 6250B < Vol UL < 37500B` | Number of seconds with IP Volume UL between …                                         |
| `Nb of sec with Vol DL < 6250B`   | Number of seconds with IP Volume DL <                                                        |
| `Nb of sec with Vol UL < 1250B`   | Number of seconds with IP Volume UL <                                                        |
| `Social Media DL (Bytes)`         | Social Media data volume (in Bytes) received by the MS during this session                   |
| `Social Media UL (Bytes)`         | Social Media data volume (in Bytes) sent by the MS during this session                       |
| `YouTube DL (Bytes)`              | YouTube data volume (in Bytes) received by the MS during this session                        |
| `YouTube UL (Bytes)`              | YouTube data volume (in Bytes) sent by the MS during this session                            |
| `Netflix DL (Bytes)`              | Netflix data volume (in Bytes) received by the MS during this session                        |
| `Netflix UL (Bytes)`              | Netflix data volume (in Bytes) sent by the MS during this session                            |
| `Google DL (Bytes)`               | Google data volume (in Bytes) received by the MS during this session                         |
| `Google UL (Bytes)`               | Google data volume (in Bytes) sent by the MS during this session                             |
| `Email DL (Bytes)`                | Email data volume (in Bytes) received by the MS during this session                          |
| `Email UL (Bytes)`                | Email data volume (in Bytes) sent by the MS during this session                              |
| `Gaming DL (Bytes)`               | Gaming data volume (in Bytes) received by the MS during this session                         |
| `Gaming UL (Bytes)`               | Gaming data volume (in Bytes) sent by the MS during this session                             |
| `Other DL`                        | Other data volume (in Bytes) received by the MS during this session                          |
| `Other UL`                        | Other data volume (in Bytes) sent by the MS during this session                              |
| `Total DL (Bytes)`                | Data volume (in Bytes) received by the MS during this session (IP layer + overhead)          |
| `Total UL (Bytes)`                | Data volume (in Bytes) sent by the MS during this session (IP layer + overhead)              |
