# Experimental Study: Tracing Evolutionary Changes in APIs

## author: Kristiyan Michaylov
## supervisor: Jacob Krüger
---

This case study aims to conduct an experimental analysis on Java-based API datasets. The primary examined datasets are **JUnit**, **Log4J**, **Apache Commons IO**, and **Project Lombok**, and they form the basis of the results presented in our research paper. Additionally, a combined dataset incorporating all the aforementioned APIs is constructed and evaluated to assess aggregated performance and insights.
## Research Question (RQ)
This case study attempts to answer the following research question:
> To what extent can an automated machine learning technique analyze and categorise the causes of changes in Java APIs?
## Process
Given the RQ, the goal is to investigate the feasibility of machine learning approaches to categorise the causes of changes. We achieve this via the following overview steps:
- Read the Excel sheets and analyze the data via visualisation techniques
- Extract prepare and tokenise input from the *Changes* column and then perform Natural Language Processing Techniques on them, such as tokenization
- Train the machine learning (ML) models with the aid of hyper-parameters to perform classification
- Check the performance of the models with metrics such as accuracy, recall, precision and F1-score

In [None]:
import pandas as pd
import re
import numpy as np
import nltk
import os
from collections import Counter
from nltk.tokenize import word_tokenize
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from nltk.stem import SnowballStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from nltk.stem import WordNetLemmatizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, recall_score, precision_score, f1_score, \
    confusion_matrix


First, we set the `dataset_name`, which is used as the name for diagram outputs. Then in the `file_path` the corresponding path to the dataset is provided, from where the data is read.

In [None]:
# file_path = "../resources/excel-sheets/JUnit.xlsx"
# sheet_name = "JUnit"
dataset_name = 'JUnit'
file_path = "../resources/datasets/JUnit.csv"

After defining our path, the first thing we do is to create a Pandas dataframe which contains the data. Then to validate that the data is assigned, the first 5 rows are printed.

In [None]:
try:
    data = pd.read_csv(file_path, sep=',')
except pd.errors.ParserError:
    data = pd.read_csv(file_path, sep=';')
print(data.head())

### Data Exploration and Analysis
First, we explore and visualizing parts of the dataset. Particularly, we check which words are associated to certain classes, what is their frequency and to see how data is distributed, some popular words, noises, etc. This information would later aid in deciding which parts of the data are important and which not.

#### Wordcloud
First we create the wordcloud to see the most common words. We only don't consider stopwords and punctuation symbols

In [None]:
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

words = [str(item) for item in data['Changes']]
# print(words)
joined_sentences = " ".join(words)

wordcloud = WordCloud(
    background_color='white',
    stopwords=STOPWORDS,
    max_font_size=25,
    scale=3
)

wordcloud = wordcloud.generate(joined_sentences)

fig = plt.figure(1, figsize=(15, 15))
plt.axis('off')
plt.imshow(wordcloud)
plt.show()


#### Ngrams
We simply show Ngrams which are contiguous sequences of n words. This gives information on how the data is dispersed.

In [None]:
def get_n_gram(dt, n=None, stopwords='english'):
    dt = dt.dropna().astype(str)
    vec = CountVectorizer(ngram_range=(n, n), stop_words=stopwords).fit(dt)
    # We get a sparse matrix corresponding to the occurrence of words per column
    sparse_matrix = vec.transform(dt)
    sum_words = sparse_matrix.sum(axis=0)
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq = sorted(words_freq, key=lambda x: x[1], reverse=True)
    return words_freq[:10]


bi_grams = get_n_gram(data['Changes'], 2)

x, y = map(list, zip(*bi_grams))
sns.barplot(x=y, y=x)

#### Text exploration
We check the text properties such as length, most common words and distribution overall and per category. This includes stopwords since we are interested in the length of each release note.

In [None]:
data_clean = data['Changes'].dropna().astype(str)
word_count = data_clean.str.split().apply(len)

plt.figure(figsize=(10, 6))
sns.barplot(x=word_count.value_counts().index, y=word_count.value_counts().values, palette='viridis',
            hue=word_count.value_counts().index)
plt.title('Word Count Frequency Distribution per Release Log Change')
plt.xlabel('Number of Words in "Changes" Column')
plt.ylabel('Frequency')
plt.xticks(rotation=90)
plt.show()

# Histogram
plt.figure(figsize=(10, 6))
sns.histplot(word_count, kde=True, color='skyblue')
plt.title('Distribution of Word Counts in "Changes" Column')
plt.xlabel('Number of Words in "Changes" column')
plt.ylabel('Frequency')
plt.show()

Here, the most common words per category are presented.


In [None]:
def separate_into_words(text):
    stop_words = set(stopwords.words("english"))
    if isinstance(text, str):
        words_in_change = text.split()
        return [word.lower() for word in words_in_change if re.match(r'^[a-zA-Z]+$',
                                                                     word) and word.casefold() not in stop_words]
    else:
        return []


category_words = data.groupby('General Category')['Changes'].apply(
    lambda texts: [word for text in texts for word in separate_into_words(text)]
)

category_word_freq = category_words.apply(Counter)

for category, freq in category_word_freq.items():
    most_common = dict(freq.most_common(10))

    plt.figure(figsize=(10, 6))
    sns.barplot(x=list(most_common.keys()), y=list(most_common.values()))

    file_name = category.replace(" ", "_")

    plt.title(f"Top Words in Category: {category}")
    plt.xticks(rotation=45)
    plt.tight_layout()

    # plt.savefig(f"../resources/diagrams/top_words_{file_name}.png", format='png')

    plt.show()
    plt.close()



Next the overall most popular words are showcased.


In [None]:
all_words = [word for text in data['Changes'] for word in separate_into_words(text)]

word_freq = Counter(all_words)

most_common = dict(word_freq.most_common(10))

plt.figure(figsize=(10, 6))
sns.barplot(x=list(most_common.keys()), y=list(most_common.values()))

plt.title("Top 10 Most Popular Words")
plt.xticks(rotation=45)
plt.tight_layout()

# plt.savefig("../resources/diagrams/top_words_all.png", format='png')
plt.show()
plt.close()

#### Category distribution
We check how are the different categories are distributed over the release log changes.

In [None]:
import matplotlib.pyplot as plt
from collections import Counter

categories = data["General Category"].dropna().astype(str)
category_frequency = Counter(categories)
category_names = list(category_frequency.keys())
category_frequencies = list(category_frequency.values())

output_dir = "../resources/data-exploration"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

colors = plt.get_cmap("tab20", len(category_names))
fig, ax = plt.subplots(figsize=(10, 6))
pie_chart = ax.pie(category_frequencies, labels=category_names, colors=colors(np.arange(len(category_names))),
                   autopct='%1.1f%%')

# ax.set_title("Pie Chart Distribution of Categories")
plt.xticks(rotation=90)
plt.tight_layout()

file_path = os.path.join(output_dir, f"pie_chart_{dataset_name}.png")

# plt.savefig(file_path, format='png')

plt.show()

plt.close()


In [None]:
categories = data["General Category"].dropna().astype(str)
category_frequency = Counter(categories)
category_names = list(category_frequency.keys())
category_frequencies = list(category_frequency.values())

output_dir = "../resources/data-exploration"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

colors = plt.get_cmap("tab20", len(category_names))

fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.bar(x=category_names, height=category_frequencies, color=colors(np.arange(len(category_names))))

ax.set_title("Distribution of Categories")
ax.set_xlabel("Category")
ax.set_ylabel("Frequency")

plt.xticks(rotation=90)
plt.tight_layout()

file_path = os.path.join(output_dir, f"bar_plot_{dataset_name}.png")

plt.savefig(file_path, format='png')

plt.show()

plt.close()

### 1 Clean the Dataset and Tokenize
First we clean the data from non-alphabetic characters and then perform the tokenization of the "Changes" column. After that we remove any stopwords, so that the output is cleaner and easier to analyse in later steps

In [None]:
# Make sure to download the following
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')

data.dropna(inplace=True, subset="Changes")
print(data["Changes"].head())


def find_most_common_words(count):
    cnt = Counter()
    for text in data["Changes"].values:
        for word in text.split():
            cnt[word] += 1
    return cnt.most_common(count)


def remove_frequent_words(text):
    return " ".join([word for word in str(text).split() if word not in most_frequent_words])


most_frequent_words = set([w for (w, wc) in find_most_common_words(10)])

data["Changes"] = data["Changes"].apply(lambda text: remove_frequent_words(text))

data["Tokens"] = data["Changes"].apply(word_tokenize)

stop_words = set(stopwords.words("english"))

data["Tokens"] = data["Tokens"].apply(lambda tokens: [word.lower() for word in tokens
                                                      if re.match(r'^[a-zA-Z]+$', word)
                                                      and word.casefold() not in stop_words])

data = data[data["Tokens"].apply(lambda tokens: len(tokens) > 0)]

print(data["Tokens"].head())
print("Data without stop words and solely alphabetical tokens: ")
print(data["Tokens"].head())
print(data[["Tokens", "General Category"]].to_string())


In the next parts, we will focus on preparing the data for the ML classifiers.
### 2. Prepare the output
The next step is to stem or lemmatize the output. The stem reduces the word to its stem, while the lemmatizer alters the word in a way which that still preserves the meaning of the word. We explore both options in our experiment.

In [None]:
stemmer = SnowballStemmer("english")
lemmatizer = WordNetLemmatizer()


def stematize_or_lemmatizer_input(is_stem=True):
    if is_stem:
        data["Adapted_Changes"] = data["Tokens"].apply(
            lambda tokens: [stemmer.stem(word) for word in tokens if isinstance(word, str)]
        )
    else:
        nltk.download('wordnet')
        data["Adapted_Changes"] = data["Tokens"].apply(
            lambda tokens: [lemmatizer.lemmatize(word) for word in tokens]
        )


# Here we remove all categories which have only one instance in the dataset, since this cannot work during the test split.
# category_counts = data['General Category'].value_counts()
#
# categories_with_multiple_occurrences = category_counts[category_counts > 1].index
#
# data = data[data['General Category'].isin(categories_with_multiple_occurrences)]

stematize_or_lemmatizer_input()
print("Adapted tokens: ")
print(data[["Tokens", "Adapted_Changes"]].head())

### 3. Input processing
After clearing the input, the next step is to extract features which will be suitable for the machine learning model. For this, we use TF-IDF (Term Frequency-Inverse Document Frequency).


In [None]:
def return_vectorizer(name_of_vectorizer="tf_idf"):
    if name_of_vectorizer == "tf_idf":
        return TfidfVectorizer(stop_words='english', max_features=1000, ngram_range=(1, 3))
    elif name_of_vectorizer == "bow":
        return CountVectorizer()

vectorizer = return_vectorizer()

print(data["Adapted_Changes"].head())

documents = data["Adapted_Changes"].apply(lambda tokens: ' '.join(tokens) if isinstance(tokens, list) else '')

# print(documents.head())
X = vectorizer.fit_transform(documents)

print(X.shape)
print(X)

### 4. Labels preparation
Once the data is prepared, the next step is to define the lables which will be used for categorization.

In [None]:
label_encoder = LabelEncoder()
print(data["General Category"].head())

y = label_encoder.fit_transform(data["General Category"])
print(y)
# print(label_encoder.classes_)
original_categories = label_encoder.inverse_transform(y)
# print(original_categories)
original_class_names = label_encoder.classes_
category_to_label = zip(original_categories, y)
category_to_label = [unique_tpl for unique_tpl in (set(tuple(pair) for pair in category_to_label))]
category_to_label.sort(key=lambda x: x[1])
print(category_to_label)

class_support = np.bincount(y)

classes_to_include = [cls for cls, support in enumerate(class_support) if support >= 0]
print(classes_to_include)

### 5. Split training and testing data
The next step involves preparing the training and testing datasets for the machine learning model. An 80/20 split is applied, allocating 80% of the data for training and 20% for testing. To ensure reproducibility of the results, a random seed of 42 is used. This guarantees that the data is split consistently across runs when using the same parameters and model configuration. Additionally, due to the imbalances in all examined datasets, a `RandomOverSampler` strategy is utilized to mitigate the problem



In [None]:
from imblearn.over_sampling import RandomOverSampler

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

ros = RandomOverSampler(random_state=42)
X_train_res, y_train_res = ros.fit_resample(X_train, y_train)

print(pd.Series(y_train_res).value_counts())
print(pd.Series(y_train).value_counts())
print(pd.Series(y_test).value_counts())

print("Training data shape: ", X_train.shape)
print("Test data: ", X_test.shape)


### Parameter Tuning
To improve the general performance of the models, now we experiment with tuning the parameters. For this, a combination of parameters is passed and the best ones are returned.

In [None]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, StratifiedKFold

print(data['General Category'].value_counts())
cvs = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

def return_best_params(hyper_parameter_type, model, params):
    if hyper_parameter_type == "Grid":
        search_type = GridSearchCV(estimator=model, param_grid=params, cv=cvs, n_jobs=-1, verbose=2, scoring='f1_macro')
    elif hyper_parameter_type == "Random":
        search_type = RandomizedSearchCV(estimator=model, param_distributions=params, cv=cvs, n_iter=500,
                                         scoring='f1_macro',
                                         n_jobs=-1, random_state=42)

    search_type.fit(X_train_res, y_train_res)
    print(search_type.best_params_)

    return search_type.best_estimator_


### 6. Train model
After the data is split and prepared, we then train the classifier. For this we use 3 different ML models for which we check the accuracy, precision, recall and F1 score of the model. We check the following in the next code blocks.

#### 6.1. RandomForrestClassifier



In [None]:
from sklearn.preprocessing import label_binarize
from sklearn.metrics import RocCurveDisplay
# Desired Thresholds for metrics:
# Accuracy -> above 70%
# Precision -> above 70%
# Recall -> above 60%
# F1-score -> above 55%
from sklearn.metrics import accuracy_score, confusion_matrix, roc_auc_score, roc_curve
param_grid_rfc = {
    'criterion': ['gini', 'log_loss'],
    'n_estimators': [100, 500, 1000],
    'max_depth': [None, 5, 10],
    'min_samples_leaf': [1, 3],
    'class_weight': [None, 'balanced', 'balanced_subsample'],
    'random_state': [42]
}

# rfc = RandomForestClassifier(class_weight="balanced", n_estimators=500, random_state=42, min_samples_split=10)
rfc = return_best_params("Grid", RandomForestClassifier(), param_grid_rfc)
rfc.fit(X_train_res, y_train_res)
y_pred_rfc = rfc.predict(X_test)

#### 6.2. Logistic Regression


In [None]:
param_grid_lrc = {
    'C': [0.01, 0.1, 1, 10, 100],
    'solver': ['saga', 'liblinear', 'newton-cg', 'lbfgs'],
    'max_iter': [2000, 5000, 10000],
    'class_weight': [None, 'balanced'],
    'random_state': [42]
}

lrc = return_best_params("Grid", LogisticRegression(), param_grid_lrc)
lrc.fit(X_train_res, y_train_res)
y_pred_lrc = lrc.predict(X_test)

#### 6.3. SVC


In [None]:
# param_grid_svc = {
#     'C': [0.01, 0.1, 1, 10, 100],
#     'gamma': ['scale', 'auto'],
#     'kernel': ['linear', 'poly', 'rbf', 'sigmoid', ],
#     'class_weight': [None, 'balanced'],
#     'max_iter': [2000, 5000, 10000],
#     'random_state': [42]
# }

param_grid_svc = {
    'C': [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto'],
    'kernel': ['linear', 'rbf', 'poly', 'sigmoid'],
    'class_weight': [None, 'balanced'],
    'max_iter': [5000, 10000],
    'random_state': [42]
}

svc = return_best_params("Grid", SVC(), param_grid_svc)
svc.fit(X_train_res, y_train_res)
y_pred_svc = svc.predict(X_test)

#### 6.4 Stacking

In [None]:
from sklearn.ensemble import RandomForestClassifier

from sklearn.ensemble import StackingClassifier

meta_model = RandomForestClassifier(n_estimators=100, random_state=42)
sclf = StackingClassifier(
    estimators=[('rfc', rfc), ('lrc', lrc), ('svc', svc), ],
    final_estimator=meta_model
)
sclf.fit(X_train_res, y_train_res)
y_pred_sclf = sclf.predict(X_test)


#### 6.5 Voting

In [None]:
from sklearn.ensemble import VotingClassifier

vc = VotingClassifier(
    estimators=[('rfc', rfc), ('lrc', lrc), ('svc', svc)],
    voting='hard'
)
vc.fit(X_train_res, y_train_res)
y_pred_vc = vc.predict(X_test)

### Performance Comparisons
In this section, we compare the performance of the different models, using the accuracy, recall, precision, F1 and confusion matrix

In [None]:
def compute_metrics(option_for_average, y_pred):
    accuracy = accuracy_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred, average=option_for_average, zero_division=0)
    precision = precision_score(y_test, y_pred, average=option_for_average, zero_division=0)
    f1 = f1_score(y_test, y_pred, average=option_for_average, zero_division=0)
    return accuracy, f1, precision, recall


def print_metrics(accuracy, f1, precision, recall):
    print("Accuracy:", accuracy)
    print("Recall:", recall)
    print("Precision:", precision)
    print("F1-score:", f1)


def convert_to_df(classifiers_names, y_preds, option_for_average="macro", preprocessing_technique="Stem"):
    models = []
    accuracy_list = []
    f1_list = []
    precision_list = []
    recall_list = []

    for classifier_name, y_pred in zip(classifiers_names, y_preds):
        accuracy, f1, precision, recall = compute_metrics(option_for_average, y_pred)

        models.append(classifier_name)
        accuracy_list.append(accuracy)
        f1_list.append(f1)
        precision_list.append(precision)
        recall_list.append(recall)

    results_df = pd.DataFrame({
        "Classification name": models,
        "Accuracy": accuracy_list,
        "Precision": precision_list,
        "Recall": recall_list,
        "F1-score": f1_list,
        # "Preprocessing technique": preprocessing_technique
    })
    return results_df


def show_classification_report(algorithm_name, y_pred):
    print("Results are for ", algorithm_name.__class__.__name__)
    classification_report(y_test, y_pred, labels=np.arange(len(original_class_names)), zero_division=0)


def convert_classification_report_to_dataframe(y_pred):
    report_dict = classification_report(y_test, y_pred, labels=np.arange(len(original_class_names)),
                                        target_names=original_class_names, zero_division=0, output_dict=True)
    report_df = pd.DataFrame(report_dict).transpose()
    report_df = report_df[report_df['support'] > 0]
    report_df.index.name = 'category'
    # print(report_df.to_string())
    return report_df


def plot_confusion_matrix(alg, y_pred):
    algorithm_name = alg.__class__.__name__
    print("Results are for", algorithm_name)
    # TODO: Extract this into function
    output_dir = "../resources/confusion-matrices"
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    file_path = os.path.join(output_dir, f"confusion_matrix_{dataset_name}_{algorithm_name}.png")

    cm = confusion_matrix(y_test, y_pred, labels=np.arange(len(original_class_names)))

    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=original_class_names,
                yticklabels=original_class_names, cbar=True)

    plt.title('Confusion Matrix', fontsize=16)
    plt.xlabel('Predicted', fontsize=12)
    plt.ylabel('True', fontsize=12)

    plt.xticks(rotation=45, ha='right')

    plt.tight_layout()
    plt.savefig(file_path, format='png')
    plt.show()

### Converter to Latex
Here we use libraries to convert the obtained results directly to latex tables and graphs which can be pasted into the paper document.


In [None]:
def convert_to_table(df: pd.DataFrame, is_classification_report=True):
    if is_classification_report:
        df['support'] = df['support'].astype(int)

    latex_table_code = df.to_latex(
        caption=f"Results for {dataset_name}",
        label="tab:...",
        column_format="p{6cm}p{2cm}p{2cm}p{2cm}p{2cm}",
        index=True,
        header=True,
        float_format="{:0.3f}".format
    )
    return latex_table_code


### Obtain results
After configuring the generation of tables and classifying models, we simply pass the parameters to a function which generates the required results.

In [None]:
y_preds = [y_pred_rfc, y_pred_svc, y_pred_vc, y_pred_sclf]
classifiers = [
    rfc.__class__.__name__,
    svc.__class__.__name__,
    vc.__class__.__name__,
    sclf.__class__.__name__
]
models = [rfc, svc, vc, sclf]
metrics_df = convert_to_df(classifiers_names=classifiers, y_preds=y_preds, option_for_average="macro")

latex_output = convert_to_table(metrics_df, False)
path = '../resources/performance-metrics'
if not os.path.exists(path):
    os.makedirs(path)
file_path = os.path.join(path, f"metrics_{dataset_name}.txt")
with open(file_path, mode='w', encoding='utf-8') as f:
    f.write(latex_output)

In [None]:
for cls, pred in zip(models, y_preds):
    report_dataframe = convert_classification_report_to_dataframe(pred)
    latex_code = convert_to_table(report_dataframe)
    path = '../resources/tables'
    if not os.path.exists(path):
        os.makedirs(path)
    file_path = os.path.join(path, f"classification_report_{dataset_name}_{cls.__class__.__name__}.txt")
    with open(file_path, mode='w', encoding='utf-8') as f:
        f.write(latex_code)
    print(latex_code)

In [None]:
for cls, pred in zip(models, y_preds):
    plot_confusion_matrix(cls, pred)

### Combine Results
Here we combine all metric results, which we will plot via the violin plot.

In [None]:
import os
import pandas as pd

copy_metrics_df = metrics_df.copy()
copy_metrics_df["Dataset"] = dataset_name

combined_result = pd.DataFrame(columns=["Dataset", "Accuracy", "Precision", "Recall", "F1-score"])

if not copy_metrics_df.empty:
    combined_result = pd.concat([combined_result, copy_metrics_df], ignore_index=True)

path = '../resources/combined-metric-results'
file_path = os.path.join(path, "combined_results.txt")

if not os.path.exists(path):
    os.makedirs(path)

if os.path.exists(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        first_line = file.readline()
        if first_line.strip() == "":
            results_df = pd.DataFrame(columns=["Dataset", "Accuracy", "Precision", "Recall", "F1-score"])
        else:
            results_df = pd.read_csv(file_path, sep='\t')
else:
    results_df = pd.DataFrame(columns=["Dataset", "Accuracy", "Precision", "Recall", "F1-score"])

if not copy_metrics_df.empty:
    if dataset_name in results_df["Dataset"].values:
        results_df.loc[results_df["Dataset"] == dataset_name, ["Accuracy", "Precision", "Recall",
                                                               "F1-score"]] = copy_metrics_df.iloc[0, 1:].values
    else:
        results_df = pd.concat([results_df, copy_metrics_df], ignore_index=True)

with open(file_path, mode='w', encoding='utf-8') as f:
    results_df.to_csv(f, sep='\t', index=False)

print(results_df)


### 7 Result Analysis
Here we plot violin plots to illustrate the data distribution. We show per metric and dataset first and then per metric for the accumulated score

In [None]:
# Plot the violin plots per metric and model here
# First we show per metric per dataset

# with open('../resources/combined-metric-results/combined_results.txt', mode='r', encoding='utf-8') as f:
#     violin_df = pd.read_csv(f, sep='\t')
#
# print(violin_df)
# plt.figure(figsize=(10, 6))
# sns.violinplot(data=violin_df, split=False, palette='Set2', legend='full')
# plt.savefig('../resources/violin-plots/violin-plot.png', format='png')
# plt.show()