![The Harmony Project logo](https://raw.githubusercontent.com/harmonydata/brand/main/Logo/PNG/%D0%BB%D0%BE%D0%B3%D0%BE%20%D1%84%D1%83%D0%BB-05.png)

<a href="https://harmonydata.ac.uk"><span align="left">🌐 harmonydata.ac.uk</span></a>
<a href="https://github.com/harmonydata/harmony"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/github-mark.svg" alt="Harmony Python package | Github" width="21px"/></a>
<a href="https://www.linkedin.com/company/harmonydata"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/linkedin.svg" alt="Harmony | LinkedIn" width="21px"/></a>
<a href="https://twitter.com/harmony_data"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/x.svg" alt="Harmony | X" width="21px"/></a>
<a href="https://www.instagram.com/harmonydata/"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/instagram.svg" alt="Harmony | Instagram" width="21px"/></a>
<a href="https://www.facebook.com/people/Harmony-Project/100086772661697/"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/fb.svg" alt="Harmony | Facebook" width="21px"/></a>
<a href="https://www.youtube.com/channel/UCraLlfBr0jXwap41oQ763OQ"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/yt.svg" alt="Harmony | YouTube" width="21px"/></a>

# Harmony walkthrough - Python library

You can run this notebook in Google Colab: <a href="https://colab.research.google.com/github/harmonydata/harmony/blob/main/Harmony_example_walkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook shows how you can use Harmony to find the similarity matrix between two questionnaires from Harmony's database, and a third questionnaire which you input here (Norwegian GAD-7).

Harmony is a data harmonisation tool that uses natural language
processing to recognise where questions in questionnaires are semantically similar. Harmony is a collaboration project between [Ulster University](https://ulster.ac.uk/), [University College London](https://ucl.ac.uk/), the [Universidade Federal de Santa Maria](https://www.ufsm.br/), and [Fast Data Science](http://fastdatascience.com/).  Harmony is funded by [Wellcome](https://wellcome.org/) as part of the [Wellcome Data Prize in Mental Health](https://wellcome.org/grant-funding/schemes/wellcome-mental-health-data-prize).

This walkthrough lets you compare items where questions have already been extracted from the PDFs. If you want to process PDFs, you also need to install
Java and [Apache Tika](https://tika.apache.org/) - see the Harmony README.

![my badge](https://badgen.net/badge/Status/In%20Development/orange)

[![PyPI package](https://img.shields.io/badge/pip%20install-harmonydata-brightgreen)](https://pypi.org/project/harmonydata/)


## Install the Harmony Python library from Pypi

In [None]:
!pip install harmonydata
!pip install matplotlib

## Import the library and check the version

In [None]:
import harmony

What version of Harmony are we on?

In [None]:
harmony.__version__

In [None]:
from harmony import create_instrument_from_list

gad_7_norwegian = create_instrument_from_list(["Følt deg nervøs, engstelig eller veldig stresset",
                   "Ikke klart å slutte å bekymre deg eller kontrolleren bekymringene dine"],
                  instrument_name="GAD-7 Norwegian")

In [None]:
instruments = [harmony.example_instruments["CES_D English"],
               harmony.example_instruments["GAD-7 Portuguese"],
               gad_7_norwegian]

In [None]:
questions, similarity, query_similarity, new_vectors_dict = harmony.match_instruments(instruments)

See the questions

In [None]:
for q in questions:
    print (q.question_text)

See the similarity matrix

In [None]:
similarity

## Plot the similarity matrix

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.imshow(similarity, cmap='hot', interpolation='nearest')

## Generate a crosswalk table

In [None]:
from harmony.matching.generate_crosswalk_table import generate_crosswalk_table

In [None]:
threshold = 0.6

In [None]:
df_crosswalk_table = generate_crosswalk_table(questions, similarity, threshold)


In [None]:
df_crosswalk_table