# Tf-Idf experiments on Cowait

In [9]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
import numpy as np
import os

In [10]:
!git clone https://github.com/backtick-se/cowait

fatal: destination path 'cowait' already exists and is not an empty directory.


Några hjälpfunktioner:

In [11]:
def find(start_dir, ext):
    """ Search files recursively """
    files = []
    for file in os.listdir(start_dir):
        path = start_dir + "/" + file
        if os.path.isdir(path):
            files += find(path, ext)
        elif os.path.isfile(path) and file.endswith(ext):
            files.append(path)
    return files

def read(path):
    with open(path) as f:
        content = f.read()
    return content

Läs in docs filer från Cowait som corpus

In [12]:
doc_files = find('cowait', '.md')
corpus = [read(file) for file in doc_files]
print(len(corpus))

34


Modellen - tf-idf med sklearn

In [13]:
vectorizer = TfidfVectorizer()
tfidf = vectorizer.fit_transform(corpus)

def match(text, tfidf):
    query = vectorizer.transform([text])
    cosine_sims = linear_kernel(query, tfidf).flatten()
    return sorted(zip(cosine_sims, doc_files), reverse=True)

Testa matcha text från PR med docs

In [14]:
pr = """
Fix RPC call race condition #309

refactor to avoid a potential race condition where the RpcCall
object may be deleted from the pending call list before reaching
the wrap_future resolve #308

- fix a race condition in RPC calls

"""

match(pr, tfidf)[:3]

[(0.31527326691095464, 'cowait/docs/tasks/remote-procedure-calls.md'),
 (0.12824104782947884, 'cowait/docs/tasks/task-lifecycle-methods.md'),
 (0.12419023110943396, 'cowait/docs/get-started/next-steps.md')]

In [15]:
pr = """
Workaround for Docker API response problem in Docker for Mac #200

Fixes issues caused by the problem described here: docker/docker-py#2696

- implement workaround for docker for mac problem
fixes docker/docker-py#2696 by catching
exceptions.
- bump version to 0.3.4

"""

match(pr, tfidf)[:3]

[(0.16879200401235867, 'cowait/docs/get-started/installation.md'),
 (0.16849455468675906, 'cowait/docs/get-started/building-and-pushing.md'),
 (0.16819006952937532, 'cowait/README.md')]

In [16]:
print(read('cowait/docs/get-started/installation.md'))

---
title: Installation
---

Installing Cowait on your local machine.

## Requirements

Cowait is a python library that packages and runs tasks in Docker containers, both locally and on [Kubernetes](https://kubernetes.io/). The base requirements are:

- Python 3.6+
- [Docker](https://docs.docker.com/get-docker/)

## Installation

Cowait is available on [Pypi](https://pypi.org/project/cowait/), you can install it with `pip`:

```shell
python -m pip install cowait
```

We recommend installing in a virtual environment ([virtualenv](https://github.com/pypa/virtualenv)/[venv](https://docs.python.org/3/library/venv.html)) or using a python package manager such as [Poetry](https://python-poetry.org/) or [Pipenv](https://pipenv.pypa.io/en/latest/).

To quickly get started with Cowait, we provide a slim Docker image (~59 MB) that includes the Cowait library. It is based on this [Dockerfile](https://github.com/backtick-se/cowait/blob/master/Dockerfile). Pull the latest image.

```shell
docker pu