# Autmatically labelling Github issues

In [1]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.14"

In [2]:
#r "nuget: Octokit, 9.0.0"
#r "nuget: Octokit.Reactive, 9.0.0"

In [None]:
#r "nuget:Microsoft.DotNet.Interactive.AIUtilities, 1.0.0-beta.24129.1"

In [4]:
using Azure;
using Azure.AI.OpenAI;
using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.AIUtilities;
using Octokit;

In [6]:
var azureOpenAIKey = await Kernel.GetPasswordAsync("Provide your OPEN_AI_KEY");
var azureOpenAIEndpoint = await Kernel.GetInputAsync("Provide the OPEN_AI_ENDPOINT");
var chatDeployment = await Kernel.GetInputAsync("Provide chat deployment name");
var embeddingDeployment = await Kernel.GetInputAsync("Provide chat embedding name");

## Access to GitHub
You will need access token with rights to query and update issues.

In [8]:
var githubKey = await Kernel.GetPasswordAsync("Provide your Github api key");
var repoName = await Kernel.GetInputAsync("Provide repo");
var org = await Kernel.GetInputAsync("Provide org");

In [9]:
OpenAIClient openAIClient = new (new Uri(azureOpenAIEndpoint), new AzureKeyCredential(azureOpenAIKey.GetClearTextPassword()));

In [10]:
var options = new ApiOptions();
var gitHubClient = new GitHubClient(new ProductHeaderValue("notebook"));

if (!string.IsNullOrEmpty(githubKey.GetClearTextPassword())) {
    Console.WriteLine("Using GitHub API token");
    var tokenAuth = new Credentials(githubKey.GetClearTextPassword());
    gitHubClient.Credentials = tokenAuth;
} else {
    Console.WriteLine("Using anonymous GitHub API");
}

Using GitHub API token


In [11]:
var allLabels = await gitHubClient.Issue.Labels.GetAllForRepository(org, repoName);

In [12]:
allLabels.DisplayTable();

Id,Url,Name,NodeId,Color,Description,Default
4773058988,https://api.github.com/repos/dotnet/interactive/labels/Area-Accessibility,Area-Accessibility,LA_kwDODgj8L88AAAABHH8ZrA,5319e7,Relating to UI accessibility issues,False
5511620623,https://api.github.com/repos/dotnet/interactive/labels/Area-API,Area-API,LA_kwDODgj8L88AAAABSISoDw,5319e7,,False
4803279709,https://api.github.com/repos/dotnet/interactive/labels/Area-Auth,Area-Auth,LA_kwDODgj8L88AAAABHkw7XQ,5319e7,,False
2094097123,https://api.github.com/repos/dotnet/interactive/labels/Area-Automation,Area-Automation,MDU6TGFiZWwyMDk0MDk3MTIz,5319e7,Relating to non-interactive execution of notebooks and scripts,False
4084666155,https://api.github.com/repos/dotnet/interactive/labels/Area-Azure%20Data%20Studio,Area-Azure Data Studio,LA_kwDODgj8L87zdw8r,5319e7,,False
1907988999,https://api.github.com/repos/dotnet/interactive/labels/Area-Build%20&%20Infrastructure,Area-Build & Infrastructure,MDU6TGFiZWwxOTA3OTg4OTk5,5319e7,Relating to this repo's build and infrastructure,False
2065909664,https://api.github.com/repos/dotnet/interactive/labels/Area-C%23,Area-C#,MDU6TGFiZWwyMDY1OTA5NjY0,5319e7,Specific to C#,False
2110504572,https://api.github.com/repos/dotnet/interactive/labels/Area-Docker,Area-Docker,MDU6TGFiZWwyMTEwNTA0NTcy,5319e7,Specific to docker,False
1801690166,https://api.github.com/repos/dotnet/interactive/labels/Area-Documentation,Area-Documentation,MDU6TGFiZWwxODAxNjkwMTY2,5319e7,Improvements or additions to documentation,False
1835518355,https://api.github.com/repos/dotnet/interactive/labels/Area-F%23,Area-F#,MDU6TGFiZWwxODM1NTE4MzU1,5319e7,Specific to F#,False


The code below is using the Octokit library, which is a .NET client for interacting with the GitHub API.

The first part of the code is creating a new instance of `RepositoryIssueRequest` named `last6Months`. This object is used to specify the parameters for a request to fetch issues from a GitHub repository. In this case, the `Filter` property is set to `IssueFilter.All`, which means that the request will return all issues regardless of their state (open, closed, etc.). The `Since` property is set to a date that is six months prior to the current date (`DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30*6))`). This means that the request will return only the issues that have been updated in the last six months.

The second part of the code is making an asynchronous request to fetch all issues for a specific repository. The `GetAllForRepository` method of the `Issue` class in the `gitHubClient` object is used to make this request. The `org` and `repoName` variables are used to specify the organization and the name of the repository from which to fetch the issues. The method returns a list of all issues in the specified repository. The `await` keyword is used to wait for the method to complete execution before moving on to the next line of code. This is necessary because the method is asynchronous, meaning it runs in the background and may not complete immediately.

In [13]:
var last6Months = new RepositoryIssueRequest
{
    Filter = IssueFilter.All,
    Since = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30*6))
};
var allIssues = await gitHubClient.Issue.GetAllForRepository(org, repoName);

In [14]:
if(allIssues.Count(i => i.Labels.Count == 0) == 0){
    "No issues without labels, no need to proceed!".Display();
}

In [15]:
public record IssueWithEmbedding(Issue Issue, float[] Embedding);

With a `foreach` loop that iterates over chunks of issues. The `Chunk(16)` method is used to divide the `allIssues` collection into smaller collections (or chunks) of 16 issues each. This is done to manage memory usage when processing large collections.

Inside the loop, for each chunk of issues, the code first concatenates the title and body of each issue and truncates the resulting string to a maximum of 8191 tokens using the `tokenizer.TruncateByTokenCount(s,8191)` method. The resulting strings are then converted to an array.

Next, the code makes an asynchronous request to an AI service (likely OpenAI) to generate embeddings for the text of each issue in the chunk. The `GetEmbeddingsAsync` method of the `openAIClient` object is used to make this request. The method takes an instance of `EmbeddingsOptions` as a parameter, which specifies the deployment of the embedding model and the text to be embedded.

The response from the AI service is then processed to extract the embeddings. The `Value.Data` property of the response contains the embeddings, which are converted to arrays and stored in the `embeddings` variable.

Finally, the code creates a new instance of `IssueWithEmbedding` for each issue in the chunk, associating each issue with its corresponding embedding. These instances are added to the `issuesWithEmbeddings` collection for further processing.

In [16]:
var issuesWithEmbeddings = new List<IssueWithEmbedding>();

var tokenizer = await Tokenizer.CreateAsync(TokenizerModel.ada2);

foreach(var chunk in allIssues.Chunk(16)){
    var text = chunk.Select(i => i.Title + "\n" + i.Body).Select(s => tokenizer.TruncateByTokenCount(s,8191)).ToArray();
    var response = await openAIClient.GetEmbeddingsAsync(new EmbeddingsOptions(embeddingDeployment, text));

    var embeddings = response.Value.Data.Select(e => e.Embedding.ToArray()).ToArray();
    for(var i = 0; i < chunk.Length; i++){
        issuesWithEmbeddings.Add(new IssueWithEmbedding(chunk[i], embeddings[i]));
    }
}

Error: Azure.RequestFailedException: Requests to the Embeddings_Create Operation under Azure OpenAI API version 2023-09-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.
Status: 429 (Too Many Requests)
ErrorCode: 429

Content:
{"error":{"code":"429","message": "Requests to the Embeddings_Create Operation under Azure OpenAI API version 2023-09-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit."}}

Headers:
x-rate-limit-reset-tokens: REDACTED
x-ms-client-request-id: 6286698b-1edb-47b5-a2f9-fdf0a1e53ed7
apim-request-id: REDACTED
Strict-Transport-Security: REDACTED
X-Content-Type-Options: REDACTED
policy-id: REDACTED
x-ms-region: REDACTED
x-ratelimit-remaining-requests: REDACTED
Date: Wed, 06 Dec 2023 12:49:31 GMT
Content-Length: 312
Content-Type: application/json

   at Azure.Core.HttpPipelineExtensions.ProcessMessageAsync(HttpPipeline pipeline, HttpMessage message, RequestContext requestContext, CancellationToken cancellationToken)
   at Azure.AI.OpenAI.OpenAIClient.GetEmbeddingsAsync(EmbeddingsOptions embeddingsOptions, CancellationToken cancellationToken)
   at Submission#14.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

The following cell is filtering the `issuesWithEmbeddings` collection into two separate lists based on the number of labels each issue has.

The first line of the code is creating a new list named `noLabels`. This list is populated with the issues from the `issuesWithEmbeddings` collection that have no labels. This is determined by the lambda expression `i => i.Issue.Labels.Count == 0` in the `Where` method, which checks if the `Labels` property of the `Issue` object has a `Count` of 0.

The second line of the code is creating another list named `labelled`. This list is populated with the issues from the `issuesWithEmbeddings` collection that have one or more labels. This is determined by the lambda expression `i => i.Issue.Labels.Count > 0` in the `Where` method, which checks if the `Labels` property of the `Issue` object has a `Count` greater than 0.

In both cases, the `ToList` method is used to convert the filtered enumerable collections to lists.

In [17]:
var noLabels = issuesWithEmbeddings.Where(i => i.Issue.Labels.Count == 0).ToList();
var labelled = issuesWithEmbeddings.Where(i => i.Issue.Labels.Count > 0).ToList();

In [18]:
public class LabelWithEmbeddings{
    public Label Label {get;set;}
    public float[] Embedding {get;set;}
    public List<IssueWithEmbedding> Issues {get;init ;} = new();
}

In [19]:
var labelsWithEmbeddings = new List<LabelWithEmbeddings>();

In [20]:
foreach(var label in allLabels.Where(e => e.Name.Contains("Area-"))){
    var issues = labelled.Where(i => i.Issue.Labels.Any(l => l.Name == label.Name)).ToList();
    if(issues.Count > 0){
        var labelWithEmbeddings = new LabelWithEmbeddings{
            Label = label,
            Issues = issues
        };
       labelsWithEmbeddings.Add(labelWithEmbeddings);
    }
}

In [21]:
foreach(var label in labelsWithEmbeddings){
    label.Embedding = label.Issues.Select(i => i.Embedding).Centroid();
}

In [22]:
var suggestions = new Dictionary<IssueWithEmbedding, LabelWithEmbeddings[]>();
foreach(var issue in noLabels){
    var suggestedLabels = labelsWithEmbeddings.ScoreBySimilarityTo(issue.Embedding, new CosineSimilarityComparer<float[]>(f => f), l => l.Embedding)
    .OrderByDescending( s => s.Score)
    .Where(s => s.Score > 0.85)
    .Take(5)
    .ToArray();
    suggestions.Add(issue, suggestedLabels.Select(s => s.Value).ToArray());
}

Then we suggest labels for GitHub issues based on their embeddings. 

The code starts by creating a new dictionary named `suggestions`. The keys in this dictionary are instances of `IssueWithEmbedding` and the values are arrays of `LabelWithEmbeddings`.

Next, the code enters a `foreach` loop that iterates over each issue in the `noLabels` list. For each issue, the code calculates the similarity between the issue's embedding and the embeddings of all labels using the `ScoreBySimilarityTo` method. This method likely calculates the cosine similarity, a measure of similarity between two non-zero vectors, between the issue's embedding and each label's embedding. The `CosineSimilarityComparer<float[]>(f => f)` is used to specify how to calculate the cosine similarity.

The resulting scores are then ordered in descending order, filtered to include only scores greater than 0.85, and the top 5 scores are selected. This means that the code is suggesting the top 5 labels that have a similarity score greater than 0.85 with the issue's embedding.

Finally, the issue and its suggested labels are added to the `suggestions` dictionary. The `Select(s => s.Key).ToArray()` part of the code is used to extract the labels (which are the keys in the score dictionary) and convert them to an array.

In [23]:
suggestions.Select(e => new {
    Issue = e.Key.Issue.Title,
    SuggestedLabels = e.Value.Select(l => l.Label.Name).ToArray()

}).DisplayTable();

Issue,SuggestedLabels
Issues with input prompt docs,"[ Area-F#, Area-PowerShell, Area-Packages and Extensions, Area-Documentation, Area-Polyglot Notebooks Extension ]"
.net interactive stuck loading nuget packagees,"[ Area-Packages and Extensions, Area-Installation, Area-F#, Area-Documentation, Area-PowerShell ]"
Wont run under .NET 8,"[ Area-Installation, Area-Packages and Extensions, Area-Documentation, Area-PowerShell, Area-F# ]"
Issues once .net 8 is installed.,"[ Area-Installation, Area-Packages and Extensions, Area-F#, Area-PowerShell, Area-Documentation ]"
Outputs from dotnet-repl are not displayed in by VS Code extension,"[ Area-Polyglot Notebooks Extension, Area-F#, Area-Jupyter Kernel, Area-Documentation, Area-PowerShell ]"
Printing values from R Type Provider prints garbled output,"[ Area-Formatting, Area-F# ]"
Polyglot Notebook: [DevExE2E][Regression] The kernelName and language show as csharp in the created Untitled-1.ipynb contents.,"[ Area-Polyglot Notebooks Extension, Area-F#, Area-JavaScript HTML CSS, Area-Jupyter Kernel, Area-Installation ]"
"Polyglot Notebook: [DevExE2E][Regression][intermittent]When running the cells one by one, test can't be stopped and always hang in running status.","[ Area-Polyglot Notebooks Extension, Area-Installation, Area-JavaScript HTML CSS, Area-PowerShell, Area-Packages and Extensions ]"
"Polyglot Notebook: [DevExE2E][Regression] After stopping the cell, no variable is shown in POLYGLOT NOTEBOOK: VARIABLES page.","[ Area-Polyglot Notebooks Extension, Area-JavaScript HTML CSS, Area-Installation, Area-PowerShell, Area-Jupyter Kernel ]"
Failed to connect to python kernel on mac,"[ Area-Python, Area-Jupyter Kernel, Area-Installation, Area-PowerShell, Area-Packages and Extensions ]"
