# azure KEYVAULT
* variables d'environnement:
    * AZURE_CLIENT_ID : id de l'application déclarée dans azure active directory
    * AZURE_CLIENT_SECRET: secret de l'application 
    * AZURE_KV: nom du KV
    * AZURE_TENANT_ID: id du azure TENANT
* autoriser l'application dans access policy en veillant à selection l application dans princicpal
    * pas la peine de selectionner dans application authorisée
* DefaultAzureCredential() lira dans l'ordre les var environ avec secret, puis avec certificat et enfin avec user/password
* Key Vault utilise l’authentification Azure Active Directory (Azure AD), qui nécessite un principal de sécurité Azure AD pour accorder l’accès. Un principal de sécurité Azure AD peut être un utilisateur, un principal de service d’application, une identité managée pour les ressources Azure ou un groupe de l’un de ces types
* il est recommandé d’utiliser **l’identité managée** pour les applications déployées sur Azure. 
    * le principal de service avec un certificat est une alternative possible. Dans ce scénario, le certificat doit être stocké dans Key Vault et faire l’objet d’une rotation fréquente
    
* **bonnes pratiques**:
    * Développement local : principal d’utilisateur ou principal de service avec un secret
    * Environnements de test et de développement : identité managée, principal de service avec certificat ou principal du service avec un secret.
    * Environnement de production : identité managée ou principal de service avec un certificat.

* Les identités managées permettent aux développeurs de ne plus avoir à gérer les informations d'identification
    * Par exemple, une application peut utiliser une identité managée pour accéder à des ressources comme Azure Key Vault :
        * où les développeurs peuvent stocker des informations d'identification de manière sécurisée, 
        * ou pour accéder à des comptes de stockage.

## 1-creation de clés dans azure keyvault

In [1]:
#https://docs.microsoft.com/en-us/azure/key-vault/secrets/quick-create-python
#pip install azure-identity
#pip install azure-keyvault-secrets

# creation de clés

import os
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential

keyVaultName = os.environ["AZURE_KV"]
KVUri = f"https://{keyVaultName}.vault.azure.net"

credential = DefaultAzureCredential()
client = SecretClient(vault_url=KVUri, credential=credential)

secretName = input("Input a name for your secret > ")
secretValue = input("Input a value for your secret > ")

print(f"Creating a secret in {keyVaultName} called '{secretName}' with the value '{secretValue}' ...")

client.set_secret(secretName, secretValue)

print(" done.")

Input a name for your secret >  yom
Input a value for your secret >  yom


Creating a secret in nab-kv called 'yom' with the value 'yom' ...
 done.


## 2-extraction de clés de azure keyvault

In [None]:
# extraction de clés
import os
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential

keyVaultName = os.environ["AZURE_KV"]
KVUri = f"https://{keyVaultName}.vault.azure.net"

credential = DefaultAzureCredential()
client = SecretClient(vault_url=KVUri, credential=credential)

print(f"Retrieving your secret from {keyVaultName}.")

secretName='nab-cs-language-secret'
retrieved_secret = client.get_secret(secretName)

print(f"Your secret is '{retrieved_secret.value}'.")

## 3- suppresion de clés dans azure keyvault

In [None]:
#suppresion de clé 
#si les droits de suppresion sont donnés dan sla policy
print(f"Deleting your secret from {keyVaultName} ...")

poller = client.begin_delete_secret(secretName)
deleted_secret = poller.result()

print(" done.")

# cognitives services : sentiment analysis : test sur 500 textes
  * tarification: 
    https://azure.microsoft.com/fr-fr/pricing/details/cognitive-services/language-service/
  * gratuits 5000 textes (1000 caracteres) / mois
  * 90 centimes par 1000 textes pour les premiers 500000 -> 450€ pour 500000 textes ed 1000 caracteres
        
  * limitations:
      * les performances dépendent d'un certain nombre de facteurs tels que 
           * le domaine du sujet, 
           * les caractéristiques du texte traité, 
           * le cas d'utilisation du système et 
           * la façon dont les gens interprètent la sortie du système
       * modèle est entrainé sur les avis de produits et de services
       * aucune compréhension de l'importance relative des différentes phrases qui sont envoyées ensemble
       * difficultés à reconnaître le sarcasme. 
       * le score de confiance ne reflète pas l'intensité du sentiment mais la confiance du modèle pour un sentiment particulier (positif, neutre, négatif). 
       * limite de données: https://docs.microsoft.com/fr-fr/azure/cognitive-services/language-service/sentiment-opinion-mining/how-to/call-api
       * limite de taux de transfert

## 1-import de tweets

In [2]:
import pandas as pd
import time

start = time.time()
data = pd.read_csv('input/sentiment140/training.1600000.processed.noemoticon.csv', delimiter = ',', header=None)
print(time.time()-start)
print('le fichier contient ', len(data.index),'lignes' )
data.columns=['target','id','date','flag','user','text']
display(data)

2.8243205547332764
le fichier contient  1600000 lignes


Unnamed: 0,target,id,date,flag,user,text
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all...."
...,...,...,...,...,...,...
1599995,4,2193601966,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,AmandaMarie1028,Just woke up. Having no school is the best fee...
1599996,4,2193601969,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,TheWDBoards,TheWDB.com - Very cool to hear old Walt interv...
1599997,4,2193601991,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,bpbabe,Are you ready for your MoJo Makeover? Ask me f...
1599998,4,2193602064,Tue Jun 16 08:40:49 PDT 2009,NO_QUERY,tinydiamondz,Happy 38th Birthday to my boo of alll time!!! ...


## 2- melange et extraction de 500 twwets avec mise en place des colonnes de score de confiance

In [37]:
data1=data.loc[data.target==0,:]
data1_shuffled=data1.sample(frac=1).reset_index(drop=True)
display(data1_shuffled)
data2=data.loc[data.target==4,:]
data2_shuffled=data2.sample(frac=1).reset_index(drop=True)
display(data2_shuffled)

data2acs=pd.concat([data1.iloc[0:250].copy(),data2.iloc[0:250].copy()])
data2acs_shuffled=data2acs.sample(frac=1).reset_index(drop=True)

data2acs_shuffled['neg']=-1
data2acs_shuffled['neut']=-1
data2acs_shuffled['pos']=-1

display(data2acs_shuffled)

Unnamed: 0,target,id,date,flag,user,text
0,0,2064642004,Sun Jun 07 06:48:44 PDT 2009,NO_QUERY,fawcett94,god its so depressing to be back in england af...
1,0,2055179073,Sat Jun 06 08:50:52 PDT 2009,NO_QUERY,ikesonthereal,@shach7 lucky bastard!!! lol cloudy over here
2,0,2259170043,Sat Jun 20 17:07:53 PDT 2009,NO_QUERY,sneezymonica,"@dailydreamer No, I def won't make it back! jo..."
3,0,2051333783,Fri Jun 05 21:41:37 PDT 2009,NO_QUERY,katie0509,is actually sick from the medicine hopefully ...
4,0,2006027229,Tue Jun 02 10:32:42 PDT 2009,NO_QUERY,AnnefromTO,"@thewrongshoes aw man, I wish I was there, I c..."
...,...,...,...,...,...,...
799995,0,2203039686,Tue Jun 16 22:54:26 PDT 2009,NO_QUERY,Lurquer,"@pcsketch too many deadlines right now, and me..."
799996,0,2056973855,Sat Jun 06 12:08:16 PDT 2009,NO_QUERY,timho90,So cold inside at work. Two more hours at work...
799997,0,2262649661,Sat Jun 20 22:56:34 PDT 2009,NO_QUERY,mjstopani,"On the phone with my best friend, it's so sad...."
799998,0,2013443182,Tue Jun 02 22:41:17 PDT 2009,NO_QUERY,emcahu,@Joey_Lenzmeier it's the only way to talk to y...


Unnamed: 0,target,id,date,flag,user,text
0,4,2014192235,Wed Jun 03 00:44:26 PDT 2009,NO_QUERY,louisethom,lovely weather again
1,4,2189659044,Tue Jun 16 00:34:25 PDT 2009,NO_QUERY,PEETEE1980,"@cheydee this is amaing weather, hope it lasts..."
2,4,1880047403,Thu May 21 23:56:14 PDT 2009,NO_QUERY,silkyninja,@BabeNatasha awww you're the greatest!
3,4,1825047295,Sun May 17 04:06:26 PDT 2009,NO_QUERY,ZeroProduction,Awesome way to increase your followers http://...
4,4,1980956525,Sun May 31 08:04:56 PDT 2009,NO_QUERY,DinoGoesRawr,"Dark brown with red tint, have i ever told any..."
...,...,...,...,...,...,...
799995,4,1976480244,Sat May 30 17:26:16 PDT 2009,NO_QUERY,theyoungestkim,@DanteLaSalle Ready for FT2: Still Broke send...
799996,4,2062594892,Sat Jun 06 23:41:50 PDT 2009,NO_QUERY,irisush,@rubyzuby I liked your name
799997,4,2058145737,Sat Jun 06 14:21:23 PDT 2009,NO_QUERY,letsbetigers,McDick's called! Orientation tomorrow at three!
799998,4,2002560772,Tue Jun 02 04:33:51 PDT 2009,NO_QUERY,xoxo_leah,@Eminem You were class on Jonathan Ross


Unnamed: 0,target,id,date,flag,user,text,neg,neut,pos
0,4,1467823594,Mon Apr 06 22:23:05 PDT 2009,NO_QUERY,ClareStewart,@ALBinLA. I was just thinking about you tonig...,-1,-1,-1
1,4,1467863012,Mon Apr 06 22:33:24 PDT 2009,NO_QUERY,kristinloves,"has discovered that she loves easter crafts, e...",-1,-1,-1
2,0,1467835305,Mon Apr 06 22:26:10 PDT 2009,NO_QUERY,MissLaura317,"@januarycrimson Sorry, babe!! My fam annoys m...",-1,-1,-1
3,0,1467839477,Mon Apr 06 22:27:16 PDT 2009,NO_QUERY,becklyn13,Hanging in Crooners. Wanna sing. Can't. Sucks.,-1,-1,-1
4,4,1467823216,Mon Apr 06 22:23:00 PDT 2009,NO_QUERY,anyyankeest,@iJohn kitteh is sleepin on my crotch which pr...,-1,-1,-1
...,...,...,...,...,...,...,...,...,...
495,0,1467853356,Mon Apr 06 22:30:54 PDT 2009,NO_QUERY,dbmendel,Picked Mich St to win it all from the get go. ...,-1,-1,-1
496,0,1467844140,Mon Apr 06 22:28:32 PDT 2009,NO_QUERY,Twokids1,@rumblepurr lol.. wish they understood dayligh...,-1,-1,-1
497,0,1467860144,Mon Apr 06 22:32:38 PDT 2009,NO_QUERY,Jana1976,"@JonathanRKnight I hate the limited letters,to...",-1,-1,-1
498,0,1467836111,Mon Apr 06 22:26:22 PDT 2009,NO_QUERY,perrohunter,"@makeherfamous hmm , do u really enjoy being ...",-1,-1,-1


## 3- appel du cognitive services pour analyser les sentiments des tweets

In [None]:
#pip install azure-ai-textanalytics==5.1.0

# extraction de clés
import os
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential
import time

keyVaultName = os.environ["AZURE_KV"]
KVUri = f"https://{keyVaultName}.vault.azure.net"

credential = DefaultAzureCredential()
client = SecretClient(vault_url=KVUri, credential=credential)


secretName='nab-cs-language-secret'
retrieved_secret = client.get_secret(secretName).value

###########################################################################


key = retrieved_secret
endpoint ="https://nab-cs-langage.cognitiveservices.azure.com/"

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

# Authenticate the client using your key and endpoint 
def authenticate_client():
    ta_credential = AzureKeyCredential(key)
    text_analytics_client = TextAnalyticsClient(
            endpoint=endpoint, 
            credential=ta_credential)
    return text_analytics_client

client = authenticate_client()

# Example function for detecting sentiment in text
def sentiment_analysis_example(client,data):
    
    data2acs=data.copy()
    for i in range(0,data2acs.shape[0],10):
        #on a bizarrement des pauses tous les 100
        #on a le droit à 10 docs par request
        print(i)
        documents=[]
        for j in range(10):
            documents.append(data.text.iloc[i+j])
        response = client.analyze_sentiment(documents=documents)
        for j in range(10):
            data2acs['pos'].iloc[i+j]=response[j].confidence_scores.positive
            data2acs['neut'].iloc[i+j]=response[j].confidence_scores.neutral
            data2acs['neg'].iloc[i+j]=response[j].confidence_scores.negative

    return data2acs 

start = time.time()
res=sentiment_analysis_example(client,data2acs_shuffled)
stime=time.time()-start

In [34]:
print(time.time()-start)
display(res)

61.757548809051514


Unnamed: 0,target,id,date,flag,user,text,neg,neut,pos
0,0,1467819650,Mon Apr 06 22:22:05 PDT 2009,NO_QUERY,antzpantz,@Viennah Yay! I'm happy for you with your job!...,0.00,0.00,0.99
1,0,1467815923,Mon Apr 06 22:21:07 PDT 2009,NO_QUERY,fatkat309,some1 hacked my account on aim now i have to ...,0.97,0.02,0.00
2,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire,0.94,0.05,0.01
3,4,1467823989,Mon Apr 06 22:23:12 PDT 2009,NO_QUERY,andwhenyousing,Across the Universe. Sleep. Rehearsal tomorrow.,0.04,0.92,0.04
4,4,1467823936,Mon Apr 06 22:23:11 PDT 2009,NO_QUERY,PamelaPJA,I think I met my first &quot;snob&quot; on twi...,0.83,0.16,0.01
...,...,...,...,...,...,...,...,...,...
95,4,1467822924,Mon Apr 06 22:22:55 PDT 2009,NO_QUERY,ddjuli,@nicolerichie: your picture is very sweet,0.00,0.00,0.99
96,4,1467824422,Mon Apr 06 22:23:19 PDT 2009,NO_QUERY,deeper2k,@stevecla it is a wallpaper with Red Square I ...,0.09,0.78,0.12
97,4,1467823109,Mon Apr 06 22:22:58 PDT 2009,NO_QUERY,mark_liu,@LordPov Are you meant to add on the back of t...,0.13,0.85,0.02
98,0,1467813579,Mon Apr 06 22:20:31 PDT 2009,NO_QUERY,starkissed,@LettyA ahh ive always wanted to see rent lov...,0.00,0.01,0.99


## 4- accuracy avec prise en compte du neutral

In [6]:
import numpy as np
from sklearn.metrics import confusion_matrix,f1_score,classification_report

res['label3']=res.loc[:,['neg','neut','pos']].apply(np.argmax, axis=1)
res['target3']=res['target'].map({0:0,4:2})

print(classification_report(res['target3'],res['label3'],target_names=['0','1','2'],zero_division=0))

              precision    recall  f1-score   support

           0       0.79      0.57      0.67       235
           1       0.00      0.00      0.00         0
           2       0.76      0.60      0.67       265

    accuracy                           0.59       500
   macro avg       0.52      0.39      0.44       500
weighted avg       0.77      0.59      0.67       500



## 5- accuracy sans prise en compte du neutral

In [7]:

res['label2']=res.loc[:,['neg','pos']].apply(np.argmax, axis=1)
res['target2']=res['target'].map({0:0,4:1})


print(classification_report(res['target2'],res['label2'],target_names=['0','1'],zero_division=0))

              precision    recall  f1-score   support

           0       0.74      0.68      0.71       235
           1       0.73      0.78      0.76       265

    accuracy                           0.73       500
   macro avg       0.73      0.73      0.73       500
weighted avg       0.73      0.73      0.73       500

