# Working with APIs and SDKs

APis and SDKs are tools that you will be most likely interact when you are building automation, interacting with the Cloud, trying to put everything together and help your ML operations environment to produce something that is reusable and automated and they will enhance and increase your confidence because of all of the automation.  

## Installing Azure Command-Line Interface (CLI)

After install Azure CLI (https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli), you can check from termianl:

* az --version: to see the version of azure-cli that is install.
* az extension list: to list the extensions that are installed.
* az extension add -n ml -y: to add the ml extension, that would allow to connect to machinery studio.
* az ml --help: to obtain help about ml extension.
* az login

Additionally, in the environment is neccesary to install azureml-core

## AzureML Studio with Python

In [None]:
import os
import creds
import azure.core

from azureml.core import Workspace, Environment
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.core.compute_target import ComputeTargetException

: 

In [None]:
resource_name = "Demo_RS"
workspace_name = "Demo_WS"
aml_compute_target = "demo-cluster"
experiment_name = "demo_experiment"

: 

### Create the workspace

Use the previously created configuration file to create the Azure ML workspace.

In [None]:
try:
    ws = Workspace.from_config()
    print("Workspace is already exist")
except:
    ws = Workspace.create(workspace_name,
                          resource_group = resource_name,
                          create_resource_group = True,
                          subscription_id = creds.subscription_id,
                          location = "East US")
    ws.write_config(".azureml")

: 

In [None]:
Workspace.from_config()

: 

In [None]:
# Create compute target

try:
    aml_compute: AmlCompute(ws, aml_compute_target)
    print("This compute target already exist")
except ComputeTargetException:
    print("Creating new compute target :", aml_compute_target)

    provisioning_config = AmlCompute.provisioning_configuration(vm_size = "STANDARD_D2_V2",
                                                                min_nodes = 1,
                                                                max_nodes = 4,
                                                                idle_seconds_before_scaledown = 3000)
    aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)
    aml_compute.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)

print("Azure ML compute attached now")

: 

### Destro workspace and resources

In [None]:
ws.delete(delete_dependent_resources = True, no_wait = False)

: 

## Hugging Face Transformers

Hugging Face is a compnany that builds a lot of APIs and packages for ML. 

In [2]:
from transformers import pipeline

In [3]:
generator = pipeline("text2text-generation", model ="t5-base")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [4]:
# Summarize
generator("summarize: Machine Learning in production environmnets is largely seen as the ultimate goal. Sometimes, deployin models can be difficult when automation is not part of the workflow. Creating a foundational process that is reliable and automated is complex and requires commitment from the team and the organizations as a whole")

[{'generated_text': 'machine learning is often seen as the ultimate goal in production . a foundational process that'}]

In [5]:
# Sentiment
generator("sst2 sentence: Automation takes hard work but allows you to have a solid deployment")

[{'generated_text': 'positive'}]

In [6]:
# Questions
generator("question: Is deploying models into production hard?")

[{'generated_text': 'not_entailment'}]

In [9]:
# Translation
generator("translate English to French: Automation takes hard work but allows you to have a solid deployment")

[{'generated_text': "L'automatisation exige beaucoup de travail, mais vous permet d'avoir un dé"}]

You can create other generation objects by calling in other models as well.

In [10]:
gpt2_generator = pipeline("text-generation", model = "gpt2")

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [11]:
gpt2_generator("The future of computational pathology", max_new_tokens = 512)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "The future of computational pathology will likely involve more rapid advances in medical imaging, neurobiology, and molecular therapy.\n\nThere will be a large population of medical patients with no history of neurologic disease who will need specialized surgery that does NOT require invasive brain scans. In addition to the need for specialized procedures, a large population of new patients will require special training to deal with these procedures, but the general practitioner should certainly not be able to use this kind of training unless the patient has an ongoing history of neurodegenerative conditions, such as Parkinson's condition, as per the recommendations of the European Alzheimer's Society.\n\n. The American Academy of Neurology and American School for Neuroscience, which has approved this plan, has been lobbying the FDA to postpone the adoption of a computerized clinical-trial protocol that is now in the public hands. Although no one knows how long it 

## Huggings Face Dataset

Other core piece of the Hugging Face libraries are the datasets. It is a library for loading datasets dynamically, similar to other dynamic loading libraries.

It allows you to retrieve dynamically any dataset that is in Huggin Face.

In [12]:
from datasets import load_dataset, list_datasets

In [13]:
# Explore available datasets
available = list_datasets()
print(len(available))
print([i for i in available if '/' not in i])

63799
['acronym_identification', 'ade_corpus_v2', 'adversarial_qa', 'aeslc', 'afrikaans_ner_corpus', 'ag_news', 'ai2_arc', 'air_dialogue', 'ajgt_twitter_ar', 'allegro_reviews', 'allocine', 'alt', 'amazon_polarity', 'amazon_reviews_multi', 'amazon_us_reviews', 'ambig_qa', 'americas_nli', 'ami', 'amttl', 'anli', 'app_reviews', 'aqua_rat', 'aquamuse', 'ar_res_reviews', 'ar_sarcasm', 'arabic_billion_words', 'arabic_pos_dialect', 'arabic_speech_corpus', 'arcd', 'arsentd_lev', 'art', 'arxiv_dataset', 'ascent_kb', 'aslg_pc12', 'asnq', 'asset', 'assin', 'assin2', 'atomic', 'autshumato', 'banking77', 'bbaw_egyptian', 'bbc_hindi_nli', 'bc2gm_corpus', 'beans', 'best2009', 'bianet', 'bible_para', 'big_patent', 'billsum', 'bing_coronavirus_query_set', 'biomrc', 'biosses', 'blbooks', 'blbooksgenre', 'blended_skill_talk', 'blimp', 'blog_authorship_corpus', 'bn_hate_speech', 'bnl_newspapers', 'bookcorpus', 'bookcorpusopen', 'boolq', 'bprec', 'break_data', 'brwac', 'bsd_ja_en', 'bswac', 'c3', 'c4', 'ca

In [14]:
# load the dataset dynamically by passing the name
movie_rationales = load_dataset("movie_rationales")

Downloading builder script:   0%|          | 0.00/1.67k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/953 [00:00<?, ?B/s]

Using custom data configuration default


Downloading and preparing dataset movie_rationales/default (download: 3.72 MiB, generated: 8.33 MiB, post-processed: Unknown size, total: 12.04 MiB) to C:\Users\ADMIN\.cache\huggingface\datasets\movie_rationales\default\0.1.0\70ed6b72496c90835e8ee73ebf8d0e49f5ad3aa93f302c8a4b6c886143cfb779...


Downloading data:   0%|          | 0.00/3.90M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1600 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/200 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/199 [00:00<?, ? examples/s]

Dataset movie_rationales downloaded and prepared to C:\Users\ADMIN\.cache\huggingface\datasets\movie_rationales\default\0.1.0\70ed6b72496c90835e8ee73ebf8d0e49f5ad3aa93f302c8a4b6c886143cfb779. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

In [16]:
# The object is a dict-like mapping of actual datasets
movie_rationales

DatasetDict({
    train: Dataset({
        features: ['review', 'label', 'evidences'],
        num_rows: 1600
    })
    validation: Dataset({
        features: ['review', 'label', 'evidences'],
        num_rows: 200
    })
    test: Dataset({
        features: ['review', 'label', 'evidences'],
        num_rows: 199
    })
})

In [19]:
# Select the "train" dataset and the port it to pandas
train = movie_rationales["train"]
df = train.to_pandas()
df.head(10)

Unnamed: 0,review,label,evidences
0,"plot : two teen couples go to a church party ,...",0,"[mind - fuck movie, the sad part is, downshift..."
1,the happy bastard 's quick movie review damn\n...,0,"[it 's pretty much a sunken ship, sutherland i..."
2,it is movies like these that make a jaded movi...,0,[the characters and acting is nothing spectacu...
3,""" quest for camelot "" is warner bros . '\nfirs...",0,"[dead on arrival, the characters stink, subpar..."
4,synopsis : a mentally unstable man undergoing ...,0,"[it is highly derivative and somewhat boring, ..."
5,capsule : in 2176 on the planet mars police ta...,0,[sadly what follows is not really up to the bu...
6,"so ask yourself what "" 8 mm "" ( "" eight millim...",0,"[probably not, tags on a ridiculous self - rig..."
7,that 's exactly how long the movie felt to me ...,0,"[nasty but unamusing joke, is annoying, they c..."
8,call it a road trip for the walking wounded .\...,0,[a sentimental and painfully mundane european ...
9,plot : a young french boy sees his parents kil...,0,"[it 's not original , is entirely predictable ..."


In [21]:
df.describe()

Unnamed: 0,label
count,1600.0
mean,0.5
std,0.500156
min,0.0
25%,0.0
50%,0.5
75%,1.0
max,1.0


## Azure Open Datasets

Open Datasets, similar to Hugging Face, its datasets offering allows us to install the library and then load these datasets dynamically.

In [27]:
from azureml.opendatasets import PublicHolidays
from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta

In [28]:
today = datetime.today()
last_year = datetime.today() - relativedelta(months = 12)
hol = PublicHolidays(start_date = last_year, end_date = today)
hol_df = hol.to_pandas_dataframe()

[Info] read from C:\Users\ADMIN\AppData\Local\Temp\tmptsq2odaf\https%3A/%2Fazureopendatastorage.azurefd.net/holidaydatacontainer/Processed/part-00000-tid-8468414522853579044-35925ba8-a227-4b80-9c89-17065e7bf1db-649-c000.snappy.parquet


In [30]:
hol_df.describe()

  hol_df.describe()


Unnamed: 0,countryOrRegion,holidayName,normalizeHolidayName,isPaidTimeOff,countryRegionCode,date
count,563,563,563,32,520,563
unique,38,354,338,2,34,167
top,Sweden,Søndag,Søndag,True,SE,2022-12-25 00:00:00
freq,65,49,49,23,65,37
first,,,,,,2022-09-25 00:00:00
last,,,,,,2023-09-24 00:00:00


In [31]:
hol_df.head(10)

Unnamed: 0,countryOrRegion,holidayName,normalizeHolidayName,isPaidTimeOff,countryRegionCode,date
27455,Norway,Søndag,Søndag,,NO,2022-09-25
27456,Sweden,Söndag,Söndag,,SE,2022-09-25
27457,Czech,Den české státnosti,Den české státnosti,,CZ,2022-09-28
27458,India,Gandhi Jayanti,Gandhi Jayanti,True,IN,2022-10-02
27459,Norway,Søndag,Søndag,,NO,2022-10-02
27460,Sweden,Söndag,Söndag,,SE,2022-10-02
27461,Germany,Tag der Deutschen Einheit,Tag der Deutschen Einheit,,DE,2022-10-03
27462,Portugal,Implantação da República,Implantação da República,,PT,2022-10-05
27463,Croatia,Dan neovisnosti,Dan neovisnosti,,HR,2022-10-08
27464,Norway,Søndag,Søndag,,NO,2022-10-09


In [32]:
hol_df[hol_df["countryOrRegion"]=="Colombia"] 

Unnamed: 0,countryOrRegion,holidayName,normalizeHolidayName,isPaidTimeOff,countryRegionCode,date
27475,Colombia,Descubrimiento de América [Discovery of Americ...,Descubrimiento de América [Discovery of America],,CO,2022-10-17
27504,Colombia,Dia de Todos los Santos [All Saint's Day](Obse...,Dia de Todos los Santos [All Saint's Day],,CO,2022-11-07
27511,Colombia,Independencia de Cartagena [Independence of Ca...,Independencia de Cartagena [Independence of Ca...,,CO,2022-11-14
27533,Colombia,La Inmaculada Concepción [Immaculate Conception],La Inmaculada Concepción [Immaculate Conception],,CO,2022-12-08
27552,Colombia,Navidad [Christmas],Navidad [Christmas],,CO,2022-12-25
27685,Colombia,Día de los Reyes Magos [Epiphany](Observed),Día de los Reyes Magos [Epiphany],,CO,2023-01-09
27729,Colombia,Día de San José [Saint Joseph's Day](Observed),Día de San José [Saint Joseph's Day],,CO,2023-03-20
27742,Colombia,Jueves Santo [Maundy Thursday],Jueves Santo [Maundy Thursday],,CO,2023-04-06
27749,Colombia,Viernes Santo [Good Friday],Viernes Santo [Good Friday],,CO,2023-04-07
27829,Colombia,Día del Trabajo [Labour Day],Día del Trabajo [Labour Day],,CO,2023-05-01


### Diabetes dataset

In [33]:
from azureml.opendatasets import Diabetes

In [36]:
diabetes = Diabetes.get_tabular_dataset()
diabetes_df = diabetes.to_pandas_dataframe()



In [37]:
diabetes_df.head(10)

Unnamed: 0,AGE,SEX,BMI,BP,S1,S2,S3,S4,S5,S6,Y
0,59,2,32.1,101.0,157,93.2,38.0,4.0,4.8598,87,151
1,48,1,21.6,87.0,183,103.2,70.0,3.0,3.8918,69,75
2,72,2,30.5,93.0,156,93.6,41.0,4.0,4.6728,85,141
3,24,1,25.3,84.0,198,131.4,40.0,5.0,4.8903,89,206
4,50,1,23.0,101.0,192,125.4,52.0,4.0,4.2905,80,135
5,23,1,22.6,89.0,139,64.8,61.0,2.0,4.1897,68,97
6,36,2,22.0,90.0,160,99.6,50.0,3.0,3.9512,82,138
7,66,2,26.2,114.0,255,185.0,56.0,4.55,4.2485,92,63
8,60,2,32.1,83.0,179,119.4,42.0,4.0,4.4773,94,110
9,29,1,30.0,85.0,180,93.4,43.0,4.0,5.3845,88,310


In [42]:
diabetes_df.query("BMI < 18")

Unnamed: 0,AGE,SEX,BMI,BP,S1,S2,S3,S4,S5,S6,Y


In [41]:
diabetes_df.query("BMI < 19")

Unnamed: 0,AGE,SEX,BMI,BP,S1,S2,S3,S4,S5,S6,Y
10,22,1,18.6,97.0,114,57.6,46.0,2.0,3.9512,83,101
136,23,1,18.8,78.0,145,72.0,63.0,2.0,3.912,86,85
247,26,1,18.8,83.0,191,103.6,69.0,3.0,4.5218,69,51
281,23,2,18.0,78.0,171,96.0,48.0,4.0,4.9053,92,94
358,43,1,18.5,87.0,163,93.6,61.0,2.67,3.7377,80,90
381,29,2,18.1,73.0,158,99.0,41.0,4.0,4.4998,78,104
406,33,1,18.9,70.0,162,91.8,59.0,3.0,4.0254,58,72


In [39]:
diabetes_df.query("BMI < 19").count()

AGE    7
SEX    7
BMI    7
BP     7
S1     7
S2     7
S3     7
S4     7
S5     7
S6     7
Y      7
dtype: int64