## Mental Health Ops - Detect Depressive Sentiment from Tweets



### Azure Machine Learning Imports

In this first code cell, we import key Azure Machine Learning modules that we will use below. 

In [50]:
import os
import requests
import tempfile
import azureml.core
from azureml.core import Workspace, Experiment, Datastore
from azureml.widgets import RunDetails

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.51.0


### Pipeline-specific SDK imports

Here, we import key pipeline modules

In [51]:
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import PythonScriptStep

print("Pipeline SDK-specific imports completed")

Pipeline SDK-specific imports completed


### Initialize Workspace

Initializing a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace(class%29) object from persisted configuration.

In [52]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

# Default datastore
def_blob_store = ws.get_default_datastore() 
# The following call GETS the Azure Blob Store associated with the workspace. 
def_blob_store = Datastore(ws, "workspaceblobstore")
print("Blobstore's name: {}".format(def_blob_store.name))

dp-tweets
my_mlops_prj
eastus2
34a65394-13fc-43ed-8e03-a0f81df6e347
Blobstore's name: workspaceblobstore


### Upload data to Blob Store
This code snippet is used to focus on uploading files (tweets and a machine learning model) to an Azure Blob Storage associated

In [53]:
from azureml.core import Dataset
from azureml.data.data_reference import DataReference


tweet1 = "./data/depressive_tweets.csv"
tweet2 = "./data/random_tweets.csv"
modelsvm = "./models/model_svm1.pkl"

#Uploading Depressive Tweets
with open(tweet1, "r") as f:
   # get_default_datastore() gets the default Azure Blob Store associated with the workspace.
   # Here we are reusing the def_blob_store object we obtained earlier
    def_blob_store.upload_files([tweet1], overwrite=True)
print("Depressive Tweets: Upload call completed")

#Uploading Random Tweets
with open(tweet2, "r") as f:
   # get_default_datastore() gets the default Azure Blob Store associated with the workspace.
   # Here we are reusing the def_blob_store object we obtained earlier
    def_blob_store.upload_files([tweet2], overwrite=True)
print("Random Tweets: Upload call completed")

#Uploading Machine Learning Model  

with open(modelsvm, "r") as f:
   # get_default_datastore() gets the default Azure Blob Store associated with the workspace.
   # Here we are reusing the def_blob_store object we obtained earlier
    def_blob_store.upload_files([modelsvm], target_path="models", overwrite=True)
print("Random Tweets: Upload call completed")

'''
my_dataset1 = Dataset.File.from_files([(def_blob_store, 'tw-data/random_tweets.csv')])
my_dataset2 = Dataset.File.from_files([(def_blob_store, 'tw-data/depressive_tweets.csv')])
'''


Uploading an estimated of 1 files
Uploading ./data/depressive_tweets.csv
Uploaded ./data/depressive_tweets.csv, 1 files out of an estimated total of 1
Uploaded 1 files
Depressive Tweets: Upload call completed
Uploading an estimated of 1 files
Uploading ./data/random_tweets.csv
Uploaded ./data/random_tweets.csv, 1 files out of an estimated total of 1
Uploaded 1 files
Random Tweets: Upload call completed
Uploading an estimated of 1 files
Uploading ./models/model_svm1.pkl
Uploaded ./models/model_svm1.pkl, 1 files out of an estimated total of 1
Uploaded 1 files
Random Tweets: Upload call completed


"\nmy_dataset1 = Dataset.File.from_files([(def_blob_store, 'tw-data/random_tweets.csv')])\nmy_dataset2 = Dataset.File.from_files([(def_blob_store, 'tw-data/depressive_tweets.csv')])\n"

#### List of Compute Targets on the workspace

In [54]:
cts = ws.compute_targets
for ct in cts:
    print(ct)

x221040381
cpu-cluster
cpu-cluster-ml


#### Retrieve or create a Azure Machine Learning compute
1. Creating the configuration
2. Creating the Azure Machine Learning compute



In [55]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Attempts to retrieve an existing compute target with the specified name 
aml_compute_target = "cpu-cluster"
try:
    aml_compute = AmlCompute(ws, aml_compute_target)
    print("found existing compute target.")
except ComputeTargetException:
    print("creating new compute target")
#If the compute target doesn't exist, the code proceeds to create a new compute target named     
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = "STANDARD_D2_V2",
                                                                min_nodes = 1, 
                                                                max_nodes = 4)    
    aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)
    aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
print("Azure Machine Learning Compute attached")


found existing compute target.
Azure Machine Learning Compute attached


In [56]:
# For a more detailed view of current Azure Machine Learning Compute status

print(aml_compute.get_status().serialize())

{'currentNodeCount': 1, 'targetNodeCount': 1, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 1, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2023-12-09T17:28:08.456000+00:00', 'errors': None, 'creationTime': '2023-12-09T16:23:25.454351+00:00', 'modifiedTime': '2023-12-09T16:23:35.280897+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT1800S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D2_V2'}


### ML Pipeline
This is to setup a run configuration for Azure Machine Learning that includes specifications for the Docker environment and Conda dependencies.

In [57]:

from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

# create a new runconfig object
run_config = RunConfiguration()

# enable Docker 
run_config.environment.docker.enabled = True

# set Docker base image to the default CPU-based image
run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE

# use conda_dependencies.yml to create a conda environment in the Docker image for execution
run_config.environment.python.user_managed_dependencies = False

# specify CondaDependencies obj
#run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])
run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=["pip", "python=3.8", "pandas==1.3.4", "numpy", "matplotlib"], 
pip_packages=['blis==0.4.1',
'certifi==2021.10.8',
'charset-normalizer==2.0.7',
'click==8.0.3',
'cycler==0.11.0',
'cymem==2.0.6',
'fonttools==4.28.1',
'ftfy==6.0.3',
'idna==3.3',
'joblib==1.1.0',
'kiwisolver==1.3.2',
'murmurhash==1.0.6',
'nltk==3.6.5',
'packaging==21.2',
'Pillow==8.4.0',
'plac==0.9.6',
'preshed==3.0.6',
'pyparsing==2.4.7',
'python-dateutil==2.8.2',
'pytz==2021.3',
'regex==2021.11.10',
'requests==2.26.0',
'scikit-learn==1.0.1',
'scipy==1.7.2',
'seaborn==0.11.2',
'setuptools==57.0.0',
'setuptools-scm==6.3.2',
'six==1.16.0',
'spacy==2.2.3',
'srsly==1.0.5',
'thinc==7.3.0',
'threadpoolctl==3.0.0',
'tomli==1.2.2',
'tqdm==4.62.3',
'typing-extensions==4.7.1',
'urllib3==1.26.7',
'wasabi==0.8.2',
'wcwidth==0.2.5'
])

'enabled' is deprecated. Please use the azureml.core.runconfig.DockerConfiguration object with the 'use_docker' param instead.


### Preprocessing Data refernece

This is part of a pipeline and is setting up a PythonScriptStep for a data preprocessing task

In [58]:
from azureml.pipeline.core import PipelineData
from azureml.pipeline.steps import PythonScriptStep
from azureml.data.data_reference import DataReference

# Uses default values for PythonScriptStep construct.

source_directory = './src'
print('Source directory for the step is {}.'.format(os.path.realpath(source_directory)))
# Referencing Depression tweets
dep_data = DataReference(
        datastore=def_blob_store,
        data_reference_name="dep_data",
        path_on_datastore=os.path.join("dep_data","depressive_tweets.csv"),
    )
print("DataReference object1 created")
#Referencing Random tweets
rand_data = DataReference(
        datastore=def_blob_store,
        data_reference_name="rand_data",
        path_on_datastore=os.path.join("rand_data","random_tweets.csv"),
    )
print("DataReference object2 created")

preprocessed_data = PipelineData('preprocessed_data')
#This is used to represent the output data of the pipeline step

step1 = PythonScriptStep(name="preprocess_step",
                         script_name="preprocess.py", 
                         arguments=["--dep_data", dep_data, "--rand_data", rand_data, "--preprocessed_data",
                         preprocessed_data],
                         inputs = [dep_data, rand_data],
                         outputs= [preprocessed_data],
                         compute_target=aml_compute, 
                         source_directory=source_directory,
                         runconfig=run_config,
                         allow_reuse=True)
print("Step1 created")

Source directory for the step is /mnt/batch/tasks/shared/LS_root/mounts/clusters/cpu-cluster-ml/code/Users/x22104038/src.
DataReference object1 created
DataReference object2 created
Step1 created


### Training the model

In [59]:
# For this step, we use a different source_directory
source_directory = './src'
print('Source directory for the step is {}.'.format(os.path.realpath(source_directory)))
# Creating DataReference for Preprocessed Data
preprocessed_data = DataReference(
        datastore=def_blob_store,
        data_reference_name="preprocessed_data",
        path_on_datastore=os.path.join("preprocessed_data.csv"),
    )
print("DataReference object4 created")

#This step is intended for training a machine learning model using the preprocessed data 
# All steps use the same Azure Machine Learning compute target as well
step2 = PythonScriptStep(name="training_step",
                         script_name="modeltrain.py", 
                         arguments=["--preprocessed_data", preprocessed_data],
                         inputs = [preprocessed_data],
                         compute_target=aml_compute, 
                         source_directory=source_directory,
                         runconfig=run_config)
step2.run_after(step1)
#The second step depends on the output of the first step, ensuring a sequential execution order in the pipeline


Source directory for the step is /mnt/batch/tasks/shared/LS_root/mounts/clusters/cpu-cluster-ml/code/Users/x22104038/src.
DataReference object4 created


### Prediciton

In [60]:
# For this step, we use yet another source_directory
# Predictions using a trained machine learning model
source_directory = './src'
print('Source directory for the step is {}.'.format(os.path.realpath(source_directory)))

step3 = PythonScriptStep(name="prediction",
                         script_name="prediction.py", 
                         compute_target=aml_compute, 
                         source_directory=source_directory,
                         runconfig=run_config)
step3.run_after(step2)
#This step is designed for making predictions using a trained machine learning model
# Prediction after Training
# list of steps to run
steps = [step1, step2, step3]

print("Step lists created")

Source directory for the step is /mnt/batch/tasks/shared/LS_root/mounts/clusters/cpu-cluster-ml/code/Users/x22104038/src.
Step lists created


### Build the pipeline


In [61]:
# Syntax
# Pipeline(workspace, 
#          steps, 
#          description=None, 
#          default_datastore_name=None, 
#          default_source_directory=None, 
#          resolve_closure=True, 
#          _workflow_provider=None, 
#          _service_endpoint=None)

pipeline1 = Pipeline(workspace=ws, steps=steps)
print ("Pipeline is built")

Pipeline is built


### Validate the pipeline


In [62]:
pipeline1.validate()
print("Pipeline validation complete")

Pipeline validation complete


In [63]:
# Submit syntax
# submit(experiment_name, 
#        pipeline_parameters=None, 
#        continue_on_step_failure=False, 
#        regenerate_outputs=False)

pipeline_run1 = Experiment(ws, 'Depression_Tweets').submit(pipeline1, regenerate_outputs=False)
print("Pipeline is submitted for execution")

Created step preprocess_step [e043d517][9eb0fc25-c483-4c48-b466-784f00805a13], (This step is eligible to reuse a previous run's output)
Created step training_step [64bdee9d][c71a5ad9-84ca-4eef-a9f9-542fb94fb05e], (This step is eligible to reuse a previous run's output)
Created step prediction [3ecbcc88][9449445c-cfee-44c7-85e8-009bc5b59883], (This step is eligible to reuse a previous run's output)
Using data reference dep_data for StepId [a57e9120][5ec54ca7-e6ac-406f-802e-74fe8ac91e59], (Consumers of this data are eligible to reuse prior runs.)
Using data reference rand_data for StepId [5bdcd7d8][afe62eae-3fa3-4e15-9519-57ba4b7bcc08], (Consumers of this data are eligible to reuse prior runs.)
Using data reference preprocessed_data for StepId [9dba6346][20022763-cfd2-4e69-b7cc-1a12d4492b2b], (Consumers of this data are eligible to reuse prior runs.)
Submitted PipelineRun 71e05035-08d3-4e9f-904e-366e69c47c3d
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/71e05035-08d3-4