# Activity Module  

This notebook reflects the code that is applied in the Activity Module for the application. In this noteb the plan is to develop the same results but using incremental steps with explanations of what is occurring and where you can get supporting documentation.  

The first step is to get all the python packages that will need to be used for the module and effectively activate them for use.  

In [1]:
import os
import asyncio
import requests
import json
import sys
sys.path.append("../")

from datetime import datetime, timedelta
from app.utility.helps import Bob
from app.utility.file_management import File_Management
from app.utility.fabric import File_Table_Management

## Getting settings and context  
The settings have information that your service principal will use to write information to your lakehouse files. The context is the tokens used by the APIs to verify your access rights to the data that is retrieved from the system.


In [2]:
bob = Bob()
settings = bob.get_settings()
headers = bob.get_context()


An exception occurred while reading the file: 'ServicePrincipal'


UnboundLocalError: local variable 'tenant_id' referenced before assignment

## Initializing the File Table Management 
The fabric File Table Management class present multiple methods for doing things like creating folders, listing folder content, writing to files, deleting files. This class takes the settings from the previous step and passes the `Client ID`, `Client Secret`, `Tenant Id` and `Workspace Name`. This classes methods do all the work of file and table management of the Lakehouse.

In [None]:
sp = json.loads(settings['ServicePrincipal'])
FF = File_Table_Management(
    tenant_id=sp['TenantId'],
    client_id=sp['AppId'],
    client_secret=sp['AppSecret'],
    workspace_name=settings['WorkspaceName']
)

## Get the State  
The `state.json` file is used for recording information into the Activty folder of the Lakehouse Files. The `state.json` file has LastRun date and time in UTC ISO(864) format. At the begining of the run the LastRun value determines how far back to read up to a maximum of 30 days of history from the current date. If for any reason the LastRun date is older than the maximum allowable then the Activity will read from the maximum date to the current date.

In [None]:
fm = File_Management()
try:
    config = await fm.read(file_name="state.yaml")
except Exception as e:
    print(f"Error: {e}")

if isinstance(config, str):
    lastRun = json.loads(config).get("lastRun")
else:
    lastRun = config.get("lastRun")

# if lastRun is recorded then proceed from there
lastRun_tm = bob.convert_dt_str(lastRun)
pivotDate = lastRun_tm.replace(hour=0, minute=0, second=0, microsecond=0)

In [None]:
async def record_audits(DirectoryClient, FF:File_Table_Management, audit, pivotDate, pageIndex, outputPath):
    if pageIndex == 1:
        lakehouseFile = f"{pivotDate.strftime('%Y%m%d')}.json"
    else:
        lakehouseFile = f"{pivotDate.strftime('%Y%m%d')}_{pageIndex}.json"

    ### This can now be streamed using the write_json_to_file method
    # TODO: convert audits to json
    #with open(outputFilePath, "w") as file:
    #    file.write(json.dumps(audit))
    FF.write_json_to_file(directory_client=DirectoryClient, file_name=lakehouseFile, json_data=audit)
    #FF.upload_file_to_directory(directory_client=dc, local_path=outputPath, file_name=lakehouseFile)

    flagNoActivity = False

    pageIndex +=1 
    audits = ""

## Pulling data from REST APIs

In [None]:
while (pivotDate<datetime.now()):
    audits = list()
    pageIndex = 1
    flagNoActivity = True

    # keep the start and end time within a 24 hour period by adding 24 hours and removing 1 second 
    nextDate = (pivotDate + timedelta(hours=24)) + timedelta(seconds=-1)
    rest_api = f"admin/activityevents?startDateTime='{pivotDate.strftime('%Y-%m-%dT%H:%M:%SZ')}'&endDateTime='{nextDate.strftime('%Y-%m-%dT%H:%M:%SZ')}'"

    continuationUri=False
    result = None

    # python does not have a do while so this is the best way 
    # just need to break out of the loop when a condition is met
    while(True):

        if continuationUri:
            result = await bob.invokeAPI(continuationUri)
        else:
            result = await bob.invokeAPI(rest_api=rest_api, headers=headers)


        # check the https response code for 200
        if "ERROR" in result:
            print(result)
            break
        else:
            # this is common to both parts of the if statement
            if result.get("activityEventEntities"):
                audits.append(result.get("activityEventEntities"))
        
            if result.get("continuaionURi"):
                continuationUri = result.get("continuationUri")

            # create the folder structure for the output path
            localPath = f"{settings.get('OutputPath')}/activity/{pivotDate.strftime('%Y')}/{pivotDate.strftime('%m')}/"
            lakehousePath = f"{settings['LakehouseName']}.Lakehouse/Files/activity/{pivotDate.strftime('%Y')}/{pivotDate.strftime('%m')}/"

            # create the folder structure for the output path                       
            #outputPath = bob.create_path(localPath)
            outputPath = localPath

            dc = await FF.create_directory(file_system_client=FF.fsc, directory_name=lakehousePath)

            # do a for loop until all json arrays in audits are read and written to storage
            for audit in audits:
                await record_audits(dc, FF, audit, pivotDate, pageIndex, outputPath)

            # get out of the inner while loop
            break

    pivotDate += timedelta(days=1)


In [None]:
environ = os.environ

for e in environ:
    print(e)

In [None]:

print("upon successful completion of the modules save the state")
with open('state.yaml', 'w') as file:
    yaml.dump(data, file)


In [None]:
convert_dt_str(catalog_lastFulScan) - datetime.now() 

In [None]:
data