paper version

angeloschatzimparmpas · Feb 3, 2023 · 9c4d632 · 9c4d632
1 parent 91b9f74
commit 9c4d632
Show file tree

Hide file tree

Showing 62 changed files with 155 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -60,5 +60,25 @@ FLASK_APP=run.py flask run
 
 Then, open your browser and point it to `localhost:8080`. We recommend using an up-to-date version of Google Chrome.
 
+# Reproducibility of the Results #
+The following instructions describe how to reach the results present in Figure 1 of the article. The aforementioned figure is connected with Section 5.2 (*Use case: explorative sampling for better classification*), and it is the main use case described in the paper.
+
+**Note:** We used OSX and Google Chrome in all our tests, so we cannot guarantee that it works in other OS or browser. However, since HardVis is written in JS and Python, it should work in all the most common platforms.
+
+**Tip:** You will have to see a red loading bar on the very top of your browser whenever something is processing.
+
+**Tip:** Our [demonstration video](https://vimeo.com/772796696) also presents the following steps, using the same data set (from 02:04 until 08:00).
+
+- Step 1: Make sure the "Vehicle Silhouette" data set is selected (top-left corner), then reload/refresh the `localhost:8080` page open in your browser. **Please note** that the first time you execute the analysis and, consequently, run the hyperparameter search, it might take a few minutes before the XGBoost classifier's hyperparameters have been tuned, using Bayesian Optimization. After the first time, the results are cached and will be re-used to make the process faster.
+- Step 2: When *Data Space* is populated with the data points, click on the stacked bar chart with value *13* for the *Number of Neighbors*, as is shown in Figure 6(a).
+- Step 3: We continue by selecting *Undersampling (US)* from the *Data Sets and Sampling Techniques* panel, and then click on the *OSS* option to activate this undersampling algorithm.
+- Step 4: After the loading process is over, we set the *Seeds* value to *250* (see Figure 6(c)). Afterward, we choose value *125* for this same parameter (cf. Figure 6(d)).
+- Step 5: At this point, we click on *Rare* from the *Types* parameter to deactivate the algorithm's application to these types of instances. In Figure 6(f), we can observe the result of this action. After everything gets reloaded, we click on *Outlier* type to deactivate this particular type, too (visible due to the removal of the *tick* symbol).
+- Step 6: Next, we select all data points in *Data Space* view by holding down the left click button and moving the mouse to surround all data points. This process is performed with the help of the lasso functionality implemented in HardVis, with dashed lines appearing in the *Data Space* view. After waiting a while until the dashed lines disappear, we press the *Execute Undersample* button in this same view.
+- Step 7: Afterward, we try out another undersampling phase. Thus, we click on the *OSS* button again to repeat the process one more time. Since the results are becoming worse, we completely deactivate this undersampling algorithm by clicking on the *Disabled* option. Please wait until the red loading bar on the very top is no longer visible.
+- Step 8: To receive the image shown in Figure 1, we have to switch to the *Oversampling (OS)* and click on the *SMOTE* option to activate this oversampling algorithm, as illustrated in Figure 1(a). Please wait until everything loads. Finally, we deactivate the *Outlier* option from the *Types* parameter.
+
+**Outcome:** The above process describes how you will be able to reproduce precisely the results presented in Figure 1 of the paper. Thank you for your time!
+
 # Corresponding Author #
 For any questions with regard to the implementation or the paper, feel free to contact [Angelos Chatzimparmpas](mailto:angelos.chatzimparmpas@lnu.se).
diff --git a/__pycache__/DBCV.cpython-38.pyc b/__pycache__/DBCV.cpython-38.pyc
diff --git a/__pycache__/__init__.cpython-38.pyc b/__pycache__/__init__.cpython-38.pyc
diff --git a/__pycache__/run.cpython-38.pyc b/__pycache__/run.cpython-38.pyc
diff --git a/__pycache__/run.cpython-39.pyc b/__pycache__/run.cpython-39.pyc
diff --git a/cachedir/joblib/run/callKSearch/72a755383fba437e4dead6ff3e3d81e3/metadata.json b/cachedir/joblib/run/callKSearch/72a755383fba437e4dead6ff3e3d81e3/metadata.json
@@ -0,0 +1 @@
+{"duration": 151.95722007751465, "input_args": {}}
diff --git a/cachedir/joblib/run/callKSearch/72a755383fba437e4dead6ff3e3d81e3/output.pkl b/cachedir/joblib/run/callKSearch/72a755383fba437e4dead6ff3e3d81e3/output.pkl
diff --git a/cachedir/joblib/run/callKSearch/func_code.py b/cachedir/joblib/run/callKSearch/func_code.py
@@ -0,0 +1,92 @@
+# first line: 480
+@memory.cache
+def callKSearch ():
+    print('findKValueNow!!!')
+    global countPercentageList
+    countPercentageList = []
+    global storeAllMetricsList
+    storeAllMetricsList = []
+    global sortShepCorrList
+    sortShepCorrList = []
+    global GatherSafe
+    GatherSafe = []
+    global GatherBorder
+    GatherBorder = []
+    global GatherRare
+    GatherRare = []
+    global GatherOut
+    GatherOut = [] 
+    global UMAPModalStore
+    UMAPModalStore = []
+    global MaxValue
+    MaxValue = []
+    global MaxIndex
+    MaxIndex = []
+
+    kValuesAll = [5,6,7,8,9,10,11,12,13]
+    mDistanceAll = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
+    dataNP = XData.to_numpy()
+    D_highSpace = distance.squareform(distance.pdist(dataNP))
+
+    for val in kValuesAll:
+        safeIndCounter = []
+        borderlineIndCounter = []
+        rareIndCounter = []
+        outIndCounter = []
+        countPercentage = []
+
+        nbrs = NearestNeighbors(n_neighbors=val, metric="euclidean", n_jobs = -1).fit(XData)
+        distances, indices = nbrs.kneighbors(XData)
+
+        summarizePerc = []
+        for idx, el in enumerate(indices):
+            computePer = -1
+            for each in el:
+                if (yData[idx] == yData[each]):
+                    computePer = computePer + 1
+            summarizePerc.append(computePer)
+
+        for i, el in enumerate(summarizePerc):
+            if (el >= (0.8 * val)):
+                safeIndCounter.append(i)
+            elif (el >= (0.5 * val)):
+                borderlineIndCounter.append(i)
+            elif (el >= (0.2 * val)):
+                rareIndCounter.append(i)
+            else:
+                outIndCounter.append(i)
+
+        percsafeIndCounter = len(safeIndCounter) / (len(safeIndCounter)+len(borderlineIndCounter)+len(rareIndCounter)+len(outIndCounter))
+        percborderlineIndCounter = len(borderlineIndCounter) / (len(safeIndCounter)+len(borderlineIndCounter)+len(rareIndCounter)+len(outIndCounter))
+        percrareIndCounter = len(rareIndCounter) / (len(safeIndCounter)+len(borderlineIndCounter)+len(rareIndCounter)+len(outIndCounter))
+        percoutIndCounter = len(outIndCounter) / (len(safeIndCounter)+len(borderlineIndCounter)+len(rareIndCounter)+len(outIndCounter))
+
+        countPercentage.append(percsafeIndCounter*100)
+        countPercentage.append(percborderlineIndCounter*100)
+        countPercentage.append(percrareIndCounter*100)
+        countPercentage.append(percoutIndCounter*100)
+
+        countPercentageList.append(countPercentage)
+
+        metricShepCorr = []
+        for dis in mDistanceAll:
+            SearchUMAP = FunUMAPAll(XData, val, dis)
+            D_lowSpace = distance.squareform(distance.pdist(SearchUMAP))
+            resultShep = shepard_diagram_correlation(D_highSpace, D_lowSpace) 
+            metricShepCorr.append(resultShep*100)
+        storeAllMetricsList.append(metricShepCorr)
+        sortShepCorr = sorted(range(len(metricShepCorr)), key=lambda k: metricShepCorr[k], reverse=True)[0]
+        sortShepCorrList.append(sortShepCorr)
+
+        max_value = max(metricShepCorr)
+        max_index = metricShepCorr.index(max_value)
+        UMAPModal = FunUMAP(XData, val, mDistanceAll[max_index])
+        UMAPModalStore.append(UMAPModal)
+        GatherSafe.append(safeIndCounter)
+        GatherBorder.append(borderlineIndCounter)
+        GatherRare.append(rareIndCounter)
+        GatherOut.append(outIndCounter)
+        MaxValue.append(max_value)
+        MaxIndex.append(max_index)
+
+    return [countPercentageList,sortShepCorrList,storeAllMetricsList,UMAPModalStore,GatherSafe,GatherBorder,GatherRare,GatherOut,MaxValue,MaxIndex]
diff --git a/cachedir/joblib/run/estimator/05d3880e26af953d1571642dafd576b1/metadata.json b/cachedir/joblib/run/estimator/05d3880e26af953d1571642dafd576b1/metadata.json
@@ -0,0 +1 @@
+{"duration": 3.770894765853882, "input_args": {"n_estimators": "122.21742728992572", "eta": "0.06452090304204987", "max_depth": "11.197056874649611", "subsample": "0.941614515559209", "colsample_bytree": "0.8311989040672406"}}
diff --git a/cachedir/joblib/run/estimator/05d3880e26af953d1571642dafd576b1/output.pkl b/cachedir/joblib/run/estimator/05d3880e26af953d1571642dafd576b1/output.pkl
diff --git a/cachedir/joblib/run/estimator/1af1546b68d4cc99c8d58bd4dae59d28/metadata.json b/cachedir/joblib/run/estimator/1af1546b68d4cc99c8d58bd4dae59d28/metadata.json
@@ -0,0 +1 @@
+{"duration": 3.1827189922332764, "input_args": {"n_estimators": "121.73840441842214", "eta": "0.28767857660247903", "max_depth": "10.39196365086843", "subsample": "0.8312037280884873", "colsample_bytree": "0.8749080237694725"}}
diff --git a/cachedir/joblib/run/estimator/1af1546b68d4cc99c8d58bd4dae59d28/output.pkl b/cachedir/joblib/run/estimator/1af1546b68d4cc99c8d58bd4dae59d28/output.pkl
diff --git a/cachedir/joblib/run/estimator/29df0e00139080829c41b0f7fc89c021/metadata.json b/cachedir/joblib/run/estimator/29df0e00139080829c41b0f7fc89c021/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.5318150520324707, "input_args": {"n_estimators": "114.44905352605177", "eta": "0.22831119680574874", "max_depth": "10.564710291701385", "subsample": "0.9541934359909122", "colsample_bytree": "0.8239188491876603"}}
diff --git a/cachedir/joblib/run/estimator/29df0e00139080829c41b0f7fc89c021/output.pkl b/cachedir/joblib/run/estimator/29df0e00139080829c41b0f7fc89c021/output.pkl
diff --git a/cachedir/joblib/run/estimator/2f9fd714184fd17393027a60c031f670/metadata.json b/cachedir/joblib/run/estimator/2f9fd714184fd17393027a60c031f670/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.116663694381714, "input_args": {"n_estimators": "72.47197879739036", "eta": "0.10844857557015576", "max_depth": "6.3014333988327635", "subsample": "0.9021891613488681", "colsample_bytree": "0.9944126787690761"}}
diff --git a/cachedir/joblib/run/estimator/2f9fd714184fd17393027a60c031f670/output.pkl b/cachedir/joblib/run/estimator/2f9fd714184fd17393027a60c031f670/output.pkl
diff --git a/cachedir/joblib/run/estimator/327316e32180ad51155871fa320ef842/metadata.json b/cachedir/joblib/run/estimator/327316e32180ad51155871fa320ef842/metadata.json
@@ -0,0 +1 @@
+{"duration": 1.6168720722198486, "input_args": {"n_estimators": "46.40612658226385", "eta": "0.2924774630404986", "max_depth": "10.99465584480253", "subsample": "0.8363649934414201", "colsample_bytree": "0.8041168988591605"}}
diff --git a/cachedir/joblib/run/estimator/327316e32180ad51155871fa320ef842/output.pkl b/cachedir/joblib/run/estimator/327316e32180ad51155871fa320ef842/output.pkl
diff --git a/cachedir/joblib/run/estimator/3817995cfec287967c718c4637023072/metadata.json b/cachedir/joblib/run/estimator/3817995cfec287967c718c4637023072/metadata.json
@@ -0,0 +1 @@
+{"duration": 3.2985379695892334, "input_args": {"n_estimators": "164.01497854869265", "eta": "0.0996789203835431", "max_depth": "6.033132702741614", "subsample": "0.9413714687695234", "colsample_bytree": "0.9544489538593315"}}
diff --git a/cachedir/joblib/run/estimator/3817995cfec287967c718c4637023072/output.pkl b/cachedir/joblib/run/estimator/3817995cfec287967c718c4637023072/output.pkl
diff --git a/cachedir/joblib/run/estimator/4bb5a840151b93d9f3c4f3bb35fa2026/metadata.json b/cachedir/joblib/run/estimator/4bb5a840151b93d9f3c4f3bb35fa2026/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.4228010177612305, "input_args": {"n_estimators": "112.07270186497003", "eta": "0.14082774572514967", "max_depth": "11.929172318453045", "subsample": "0.9543434280122952", "colsample_bytree": "0.9863189678041957"}}
diff --git a/cachedir/joblib/run/estimator/4bb5a840151b93d9f3c4f3bb35fa2026/output.pkl b/cachedir/joblib/run/estimator/4bb5a840151b93d9f3c4f3bb35fa2026/output.pkl
diff --git a/cachedir/joblib/run/estimator/50fbc3f230691ba411bd72b0169a0a62/metadata.json b/cachedir/joblib/run/estimator/50fbc3f230691ba411bd72b0169a0a62/metadata.json
@@ -0,0 +1 @@
+{"duration": 1.3942599296569824, "input_args": {"n_estimators": "74.90081706613316", "eta": "0.24281758667148645", "max_depth": "6.444267910404542", "subsample": "0.823173811905026", "colsample_bytree": "0.9458014336081975"}}
diff --git a/cachedir/joblib/run/estimator/50fbc3f230691ba411bd72b0169a0a62/output.pkl b/cachedir/joblib/run/estimator/50fbc3f230691ba411bd72b0169a0a62/output.pkl
diff --git a/cachedir/joblib/run/estimator/6ccc92bc0ee173c544460491c26e1747/metadata.json b/cachedir/joblib/run/estimator/6ccc92bc0ee173c544460491c26e1747/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.0340428352355957, "input_args": {"n_estimators": "123.31233076168427", "eta": "0.3", "max_depth": "8.868729253728564", "subsample": "0.8", "colsample_bytree": "0.8478619789553219"}}
diff --git a/cachedir/joblib/run/estimator/6ccc92bc0ee173c544460491c26e1747/output.pkl b/cachedir/joblib/run/estimator/6ccc92bc0ee173c544460491c26e1747/output.pkl
diff --git a/cachedir/joblib/run/estimator/74f44d5c9520e7d2da6d3ffbfd1cf5ab/metadata.json b/cachedir/joblib/run/estimator/74f44d5c9520e7d2da6d3ffbfd1cf5ab/metadata.json
@@ -0,0 +1 @@
+{"duration": 0.4379770755767822, "input_args": {"n_estimators": "13.819321337554923", "eta": "0.07212312551297988", "max_depth": "7.175897174514871", "subsample": "0.8650660661526529", "colsample_bytree": "0.9843748470046234"}}
diff --git a/cachedir/joblib/run/estimator/74f44d5c9520e7d2da6d3ffbfd1cf5ab/output.pkl b/cachedir/joblib/run/estimator/74f44d5c9520e7d2da6d3ffbfd1cf5ab/output.pkl
diff --git a/cachedir/joblib/run/estimator/7ab64e706497776f59a046a2b5062255/metadata.json b/cachedir/joblib/run/estimator/7ab64e706497776f59a046a2b5062255/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.51269793510437, "input_args": {"n_estimators": "179.94824742138775", "eta": "0.26054377863611405", "max_depth": "7.776976381564504", "subsample": "0.9240763899531264", "colsample_bytree": "0.9165355749433655"}}
diff --git a/cachedir/joblib/run/estimator/7ab64e706497776f59a046a2b5062255/output.pkl b/cachedir/joblib/run/estimator/7ab64e706497776f59a046a2b5062255/output.pkl
diff --git a/cachedir/joblib/run/estimator/7ced3baf3bece1ed6e5280ba7a8371d3/metadata.json b/cachedir/joblib/run/estimator/7ced3baf3bece1ed6e5280ba7a8371d3/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.693844795227051, "input_args": {"n_estimators": "138.4254401698706", "eta": "0.1261534422933427", "max_depth": "6.586032684038303", "subsample": "0.8880304987479203", "colsample_bytree": "0.9616794696232922"}}
diff --git a/cachedir/joblib/run/estimator/7ced3baf3bece1ed6e5280ba7a8371d3/output.pkl b/cachedir/joblib/run/estimator/7ced3baf3bece1ed6e5280ba7a8371d3/output.pkl
diff --git a/cachedir/joblib/run/estimator/7e6eb9cd66fc869e2a5adf5c0d6d4159/metadata.json b/cachedir/joblib/run/estimator/7e6eb9cd66fc869e2a5adf5c0d6d4159/metadata.json
@@ -0,0 +1 @@
+{"duration": 3.362119197845459, "input_args": {"n_estimators": "179.4913333333915", "eta": "0.24378320584027863", "max_depth": "11.636993649385134", "subsample": "0.919579995762217", "colsample_bytree": "0.9939169255529117"}}
diff --git a/cachedir/joblib/run/estimator/7e6eb9cd66fc869e2a5adf5c0d6d4159/output.pkl b/cachedir/joblib/run/estimator/7e6eb9cd66fc869e2a5adf5c0d6d4159/output.pkl
diff --git a/cachedir/joblib/run/estimator/88554840ace874a0a6add81006cdaae0/metadata.json b/cachedir/joblib/run/estimator/88554840ace874a0a6add81006cdaae0/metadata.json
@@ -0,0 +1 @@
+{"duration": 1.0949859619140625, "input_args": {"n_estimators": "17.393878305774606", "eta": "0.20582453170688947", "max_depth": "7.985388149115895", "subsample": "0.8621964643431325", "colsample_bytree": "0.9726206851751187"}}
diff --git a/cachedir/joblib/run/estimator/88554840ace874a0a6add81006cdaae0/output.pkl b/cachedir/joblib/run/estimator/88554840ace874a0a6add81006cdaae0/output.pkl
diff --git a/cachedir/joblib/run/estimator/935962d74be8405b0d9f7799881e329a/metadata.json b/cachedir/joblib/run/estimator/935962d74be8405b0d9f7799881e329a/metadata.json
@@ -0,0 +1 @@
+{"duration": 3.5751378536224365, "input_args": {"n_estimators": "120.52084092809828", "eta": "0.09991844553958994", "max_depth": "9.08540663048167", "subsample": "0.8092900825439996", "colsample_bytree": "0.9570351922786027"}}
diff --git a/cachedir/joblib/run/estimator/935962d74be8405b0d9f7799881e329a/output.pkl b/cachedir/joblib/run/estimator/935962d74be8405b0d9f7799881e329a/output.pkl
diff --git a/cachedir/joblib/run/estimator/a25b581b8844eaf440c9cef4780829a4/metadata.json b/cachedir/joblib/run/estimator/a25b581b8844eaf440c9cef4780829a4/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.3303470611572266, "input_args": {"n_estimators": "89.22927863521258", "eta": "0.12606056073988442", "max_depth": "9.148538589793427", "subsample": "0.8582458280396084", "colsample_bytree": "0.8366809019706868"}}
diff --git a/cachedir/joblib/run/estimator/a25b581b8844eaf440c9cef4780829a4/output.pkl b/cachedir/joblib/run/estimator/a25b581b8844eaf440c9cef4780829a4/output.pkl
diff --git a/cachedir/joblib/run/estimator/a5467f6c54392f7bdcd9d377529893f3/metadata.json b/cachedir/joblib/run/estimator/a5467f6c54392f7bdcd9d377529893f3/metadata.json
@@ -0,0 +1 @@
+{"duration": 0.18697500228881836, "input_args": {"n_estimators": "9.956729715098561", "eta": "0.18068320734549853", "max_depth": "8.565246110151298", "subsample": "0.821578285398661", "colsample_bytree": "0.8987591192728782"}}
diff --git a/cachedir/joblib/run/estimator/a5467f6c54392f7bdcd9d377529893f3/output.pkl b/cachedir/joblib/run/estimator/a5467f6c54392f7bdcd9d377529893f3/output.pkl
diff --git a/cachedir/joblib/run/estimator/a82553c774ee73a6be946ec37b407be5/metadata.json b/cachedir/joblib/run/estimator/a82553c774ee73a6be946ec37b407be5/metadata.json
@@ -0,0 +1 @@
+{"duration": 3.523658275604248, "input_args": {"n_estimators": "76.44055944226989", "eta": "0.08487346516301046", "max_depth": "7.752867891211309", "subsample": "0.8912139968434072", "colsample_bytree": "0.9223705789444759"}}
diff --git a/cachedir/joblib/run/estimator/a82553c774ee73a6be946ec37b407be5/output.pkl b/cachedir/joblib/run/estimator/a82553c774ee73a6be946ec37b407be5/output.pkl
diff --git a/cachedir/joblib/run/estimator/b939d29574302105526acc19e8b91e67/metadata.json b/cachedir/joblib/run/estimator/b939d29574302105526acc19e8b91e67/metadata.json
@@ -0,0 +1 @@
+{"duration": 3.6108059883117676, "input_args": {"n_estimators": "74.56689870524991", "eta": "0.11783725794347398", "max_depth": "10.972425054911575", "subsample": "0.8561869019374762", "colsample_bytree": "0.8777354579378964"}}
diff --git a/cachedir/joblib/run/estimator/b939d29574302105526acc19e8b91e67/output.pkl b/cachedir/joblib/run/estimator/b939d29574302105526acc19e8b91e67/output.pkl
diff --git a/cachedir/joblib/run/estimator/c9e4be893a46a25b9cdc990f5c1a0e66/metadata.json b/cachedir/joblib/run/estimator/c9e4be893a46a25b9cdc990f5c1a0e66/metadata.json
@@ -0,0 +1 @@
+{"duration": 1.841581106185913, "input_args": {"n_estimators": "114.22795618793002", "eta": "0.1910107263433683", "max_depth": "8.16101270435952", "subsample": "0.8926456235313323", "colsample_bytree": "0.8"}}
diff --git a/cachedir/joblib/run/estimator/c9e4be893a46a25b9cdc990f5c1a0e66/output.pkl b/cachedir/joblib/run/estimator/c9e4be893a46a25b9cdc990f5c1a0e66/output.pkl
diff --git a/cachedir/joblib/run/estimator/d024c86b47b205c6007b19815c708182/metadata.json b/cachedir/joblib/run/estimator/d024c86b47b205c6007b19815c708182/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.5729548931121826, "input_args": {"n_estimators": "182.3174784053625", "eta": "0.17379422752781754", "max_depth": "6.20633112669131", "subsample": "0.8517559963200034", "colsample_bytree": "0.8244076469689559"}}
diff --git a/cachedir/joblib/run/estimator/d024c86b47b205c6007b19815c708182/output.pkl b/cachedir/joblib/run/estimator/d024c86b47b205c6007b19815c708182/output.pkl
diff --git a/cachedir/joblib/run/estimator/d471c7fe1ac9674a7221da9291674c1b/metadata.json b/cachedir/joblib/run/estimator/d471c7fe1ac9674a7221da9291674c1b/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.7057480812072754, "input_args": {"n_estimators": "111.60850447193953", "eta": "0.12792776902235276", "max_depth": "9.120408127066865", "subsample": "0.8369708911051055", "colsample_bytree": "0.9325044568707964"}}
diff --git a/cachedir/joblib/run/estimator/d471c7fe1ac9674a7221da9291674c1b/output.pkl b/cachedir/joblib/run/estimator/d471c7fe1ac9674a7221da9291674c1b/output.pkl
diff --git a/cachedir/joblib/run/estimator/d6c06c0d7bf57033e31405dd8af38eb0/metadata.json b/cachedir/joblib/run/estimator/d6c06c0d7bf57033e31405dd8af38eb0/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.622046947479248, "input_args": {"n_estimators": "190.03267976439997", "eta": "0.09263103092182289", "max_depth": "6.390309557911677", "subsample": "0.9931264066149119", "colsample_bytree": "0.9215089703802877"}}
diff --git a/cachedir/joblib/run/estimator/d6c06c0d7bf57033e31405dd8af38eb0/output.pkl b/cachedir/joblib/run/estimator/d6c06c0d7bf57033e31405dd8af38eb0/output.pkl
diff --git a/cachedir/joblib/run/estimator/de50fe45dc189355c420e1e90d7a5172/metadata.json b/cachedir/joblib/run/estimator/de50fe45dc189355c420e1e90d7a5172/metadata.json
@@ -0,0 +1 @@
+{"duration": 2.799971103668213, "input_args": {"n_estimators": "178.00648480238368", "eta": "0.232401544584516", "max_depth": "9.825344828131279", "subsample": "0.8944429850323898", "colsample_bytree": "0.8650366644053494"}}
diff --git a/cachedir/joblib/run/estimator/de50fe45dc189355c420e1e90d7a5172/output.pkl b/cachedir/joblib/run/estimator/de50fe45dc189355c420e1e90d7a5172/output.pkl
diff --git a/cachedir/joblib/run/estimator/e59aa7f954076adbfd81074357b460b9/metadata.json b/cachedir/joblib/run/estimator/e59aa7f954076adbfd81074357b460b9/metadata.json
@@ -0,0 +1 @@
+{"duration": 1.4785480499267578, "input_args": {"n_estimators": "19.53737551755531", "eta": "0.08523105624369066", "max_depth": "10.813181884524237", "subsample": "0.9973773873201035", "colsample_bytree": "0.9085392166316497"}}
diff --git a/cachedir/joblib/run/estimator/e59aa7f954076adbfd81074357b460b9/output.pkl b/cachedir/joblib/run/estimator/e59aa7f954076adbfd81074357b460b9/output.pkl
diff --git a/cachedir/joblib/run/estimator/func_code.py b/cachedir/joblib/run/estimator/func_code.py
@@ -0,0 +1,12 @@
+# first line: 464
+    @memory.cache
+    def estimator(n_estimators, eta, max_depth, subsample, colsample_bytree):
+        # initialize model
+        print('modelsCompNow!!!!!')
+        n_estimators = int(n_estimators)
+        max_depth = int(max_depth)
+        model = XGBClassifier(n_estimators=n_estimators, eta=eta, max_depth=max_depth, subsample=subsample, colsample_bytree=colsample_bytree, n_jobs=-1, random_state=RANDOM_SEED, seed=RANDOM_SEED, silent=True, verbosity = 0, use_label_encoder=False)
+        # set in cross-validation
+        result = cross_validate(model, XData, yData, cv=crossValidation, scoring='accuracy')
+        # result is mean of test_score
+        return np.mean(result['test_score'])
diff --git a/frontend/src/components/DataSetSlider.vue b/frontend/src/components/DataSetSlider.vue
@@ -2,9 +2,9 @@
 <div>
   <label id="data" for="param-dataset" data-toggle="tooltip" data-placement="right" title="Tip: use one of the data sets already provided or upload a new file.">{{ dataset }}</label>
   <select id="selectFile" @change="selectDataSet()">
-      <option value="VehicleC.csv" >Vehicle Silhouette</option>
+      <option value="VehicleC.csv" selected>Vehicle Silhouette</option>
       <option value="breastC.csv" >Breast Cancer</option>
-      <option value="IrisC.csv" selected>Iris Flower</option>
+      <option value="IrisC.csv" >Iris Flower</option>
   </select>
   <button style="float: right;" class="btn-outline-dark"
   id="know"

diff --git a/requirements.txt b/requirements.txt
@@ -2,10 +2,12 @@ pymongo~=3.11.0
 Flask~=1.1.2
 Flask-PyMongo~=2.3.0
 Flask-Cors~=3.0.9
+scipy~=1.5.2
 numpy~=1.21.4
 pandas~=1.1.2
 joblib~=1.1.0
 scikit-learn~=0.23.2
-scipy~=1.5.2
+xgboost~=1.3.3
+bayesian-optimization~=1.2.0
 umap-learn~=0.5.3
 imblearn~=0.0
diff --git a/thumbnail_representative.png b/thumbnail_representative.png