<a href="https://colab.research.google.com/github/Tian1398/IMT542_G4_application/blob/main/IMT542_G8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#1 Describe the use, audience and access of the project information

This project aims to streamline the extraction and processing of data from Microsoft Teams, converting it into a format suitable for organizational network analysis on the Polinode software. By automating this process, it can help people better understand human interactions from their communication data.
The primary use of this pipeline is to facilitate the analysis of communication patterns and organizational dynamics within a company. The target audience includes data analysts, organizational researchers, HRs, or practically anybody who are interested in understanding and enhancing workflows and efficiency.
Access to the data will be restricted to people who have valid Teams accounts and are authorized to access it.

# 2 Methodology: how to generate the information structure

Data Extraction
*   Use the user ID and passcode to authenticate and acquire an access token from Microsoft.
*   Use the access token to make requests to the Microsoft Graph API to retrieve Teams data.

Data Transformation
*   Write Python code to load, clean, manipulate, and output data according to Polinode's requirements.
*   Perform validation steps to ensure basic data requirements are met before feeding the original data to the transformation.
*   Include documentation to improve process transparency and make it easier for troubleshooting.

# 3 Access: steps for a user to access the information

The code for data extraction and data transformation will be published on GitHub. The repository will be free and open for public access. Users are welcomed to review, copy, and edit the files to obtain their own Microsoft Teams (meta)data and run the code individually in their local environment. If successful, users will expect to be able to download the transformed data in their desired format (csv), so it is ready for Polinode ONA analysis.

# 4 Structure: information schema

In general, the original Microsoft Teams data contain data for teams, channels (within each team), and messages (within each channel).

*   Teams: id, displayName, description, and createdDateTime.
*   Channels: id, displayName, description, and createdDateTime.
*   Messages: id, createdDateTime, sender information (from), and the message body content (body).

# 5 Example

Show example request and response for at least one intended use of information that demonstrates access and structure. So far, I only have the data transformation below, and I am still working on data extraction section.

In [7]:
# load packages
import pandas as pd
import numpy as np
import json
import requests # for later API purpose

In [8]:
# sample Teams data
teams_data = {
  "teams": [
    {
      "id": "team-id-1",
      "displayName": "Team 1",
      "channels": [
        {
          "id": "channel-id-1",
          "displayName": "General",
          "messages": [
            {
              "id": "message-id-1",
              "createdDateTime": "2023-05-17T10:10:00Z",
              "from": {
                "user": {
                  "id": "user-id-1",
                  "displayName": "John Doe",
                  "userPrincipalName": "john.doe@example.com"
                }
              },
              "body": {
                "content": "Hello, team! This is the first message."
              }
            },
            {
              "id": "message-id-2",
              "createdDateTime": "2023-05-17T10:12:00Z",
              "from": {
                "user": {
                  "id": "user-id-2",
                  "displayName": "Jane Smith",
                  "userPrincipalName": "jane.smith@example.com"
                }
              },
              "body": {
                "content": "Hi John! Welcome to the team."
              }
            }
          ]
        }
      ]
    }
  ]
}

In [9]:
nodes = {} # entity
edges = [] # interactions

for team in teams_data['teams']:
    for channel in team['channels']:
        for message in channel['messages']:
            user = message['from']['user']
            user_id = user['id']
            if user_id not in nodes:
                nodes[user_id] = {
                    "id": user_id,
                    "displayName": user['displayName'],
                    "userPrincipalName": user['userPrincipalName']
                }

            # create edges
            edge = {
                "source": user_id,
                "target": "channel-general",
                "timestamp": message['createdDateTime']
            }
            edges.append(edge)

nodes_list = list(nodes.values())

# Print nodes and edges
print("Nodes:", json.dumps(nodes_list, indent=2))
print("Edges:", json.dumps(edges, indent=2))

Nodes: [
  {
    "id": "user-id-1",
    "displayName": "John Doe",
    "userPrincipalName": "john.doe@example.com"
  },
  {
    "id": "user-id-2",
    "displayName": "Jane Smith",
    "userPrincipalName": "jane.smith@example.com"
  }
]
Edges: [
  {
    "source": "user-id-1",
    "target": "channel-general",
    "timestamp": "2023-05-17T10:10:00Z"
  },
  {
    "source": "user-id-2",
    "target": "channel-general",
    "timestamp": "2023-05-17T10:12:00Z"
  }
]


In [11]:
# convert nodes and edges to tables
nodes_list = list(nodes.values())
nodes_df = pd.DataFrame(nodes_list)
nodes_df
edges_df = pd.DataFrame(edges)
edges_df

Unnamed: 0,source,target,timestamp
0,user-id-1,channel-general,2023-05-17T10:10:00Z
1,user-id-2,channel-general,2023-05-17T10:12:00Z
