v-measure #19

corinne-hcr · 2021-02-21T01:32:11Z

My question for v-measure was what to put in labels_true and labels_pred. (But I have some clues now)
metrics.homogeneity_score(labels_true, labels_pred)
metrics.completeness_score(labels_true, labels_pred)
metrics.v_measure_score(labels_true, labels_pred)

plot_kmeans_digits and plot_document_clustering are two examples from sklearn. I add some test code to print some process.

In plot_document_clustering, the author uses k-means for clustering, but the data come with labels, so he uses the length of labels to determine the clusters.

labels = dataset.target
true_k = np.unique(labels).shape[0]

Then he calls KMeans like this

km = KMeans(n_clusters=true_k, init='k-means++', max_iter=100, n_init=1, verbose=opts.verbose)

He runs v-measure like this

print("V-measure: %0.3f" % metrics.v_measure_score(labels, km.labels_))

labelsis a 1D ndarray, and so is km.labels_ , the size of the array is the size of all data ( so I guess I need to pass in all data labels, instead of data labels from one bin a time then calculate the mean score)

The type of labels is <class 'numpy.ndarray'>
the shape of labels is n_samples 3387

The type of km.labels_ is <class 'numpy.ndarray'>
the shape of km.labels_ is n_samples 3387
true_k is 4

np.unique(labels) is [0 1 2 3]
np.unique(labels).shape is 4

(However, in documentation examples, the author passes in lists for both labels_true and labels_pred)

Since the author didn't set random_state in KMeans, the Homogeneity,Completeness, andV-measure are different every time ( the labels from clustering are different every time)

In the documentation, there is a sentence - This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.
I understand what this means now.
For example,

labels_true = [0,0,1,1]
labels_pred = [1,1,2,3]
>>>metrics.homogeneity_score(labels_true, labels_pred)
0.9999999999999999
>>>metrics.completeness_score(labels_true, labels_pred)
0.6666666666666666
>>>metrics.v_measure_score(labels_true, labels_pred)
0.7999999999999999

labels_true = [11,11,1,1]
labels_pred = [2,2,3,4]
>>>metrics.homogeneity_score(labels_true, labels_pred)
0.9999999999999999
>>>metrics.completeness_score(labels_true, labels_pred)
0.6666666666666666
>>>metrics.v_measure_score(labels_true, labels_pred)
0.7999999999999999

The contents above are what my see from the example codes.

Here is my questions:
For homogeneity_score
In documentation page, there are some examples of homogeneity_score, but I don't understand.

Clusters that include samples from different classes do not make for an homogeneous labeling:
>>> print("%.6f" % homogeneity_score([0, 0, 1, 1], [0, 1, 0, 1]))
0.0...
>>> print("%.6f" % homogeneity_score([0, 0, 1, 1], [0, 0, 0, 0]))
0.0...

I try a similar test on notebook.

labels_true = [0,1,0,1]
labels_pred = [0,0,0,0]

>>>metrics.homogeneity_score(labels_true, labels_pred)
1.6017132519074588e-16

The homogeneity_score should be bounded [0,1]. But if I run on my data, this circumstance won't happen since there are more than 1 bin/cluster. I just don't understand why would this happen.

Now, I am working on labeling the ground truth based on the user_input (on data above cutoff)--labels_true. I will give the same label for same user_input trips. Also, I will label the trips in bins according to the bin index--labels_pred. Finally, put them in the function.

shankari · 2021-02-21T02:10:25Z

but the data come with labels, so he uses the length of labels to determine the clusters.

Your data also comes with labels. The labels are represented by a 3 tuple of values. Why can't you use a similar technique (e.g. drop_duplicates, which you are already familar with) to identify the set of unique labels?

Now, I am working on labeling the ground truth based on the user_input (on data above cutoff)

How are you doing this?

so I guess I need to pass in all data labels, instead of data labels from one bin a time then calculate the mean score

Yes. That is the way that all ML algorithms typically work. I think this is why we have been talking at cross-purposes for a few days.

The homogeneity_score should be bounded [0,1]. But if I run on my data, this circumstance won't happen since there are more than 1 bin/cluster. I just don't understand why would this happen.

I don't understand this statement. The score in the notebook is 1.6017132519074588e-16 which is between [0,1] and is 0 if printed in the same way as the example.

>>> print("%.6f" % 1.6017132519074588e-16)
0.000000

corinne-hcr · 2021-02-21T02:14:14Z

Thanks for clarification on 1.6017132519074588e-16

(e.g. drop_duplicates)

Yes, that's what I am doing

corinne-hcr · 2021-02-21T02:56:08Z

@shankari I have a question about user_input
What's the difference between

51	pilot_ebike	home	pilot_ebike

and

98	pilot_ebike	home	same_mode

??

If I remember correctly, same_mode in replaced_mode is the same as the one in mode_confirm

shankari · 2021-02-21T03:53:05Z

correct. There is no difference. Both of those are invalid replaced_mode for pilot eBike since we wanted to know what mode people would have used if they didn't have the eBike, but not everybody fixed their labels. The mode and purpose are valid, though.

shankari · 2021-02-21T03:53:25Z

There should be very few entries like that

corinne-hcr · 2021-02-21T04:43:04Z

I haven't counted all users.
For user[0],

['pilot_ebike', 'community_walk', 'same_mode']
['pilot_ebike', 'home', 'same_mode']
['pilot_ebike', 'home', 'same_mode']
['walk', 'home', 'same_mode']

For user[2]

['pilot_ebike', 'work', 'same_mode']
['pilot_ebike', 'work', 'same_mode']
['pilot_ebike', 'work', 'same_mode']
['pilot_ebike', 'shopping', 'same_mode']
['pilot_ebike', 'work', 'same_mode']
['drove_alone', 'work', 'same_mode']
['drove_alone', 'work', 'same_mode']
['drove_alone', 'pick_drop', 'same_mode']
['drove_alone', 'pick_drop', 'same_mode']

For user[3]

['walk', 'pick_drop', 'same_mode']
['walk', 'home', 'same_mode']

For user[4], there are some NaN in both purpose_confirm and replaced_mode, the score is lower.

  mode_confirm purpose_confirm replaced_mode
0  shared_ride             NaN           NaN
1  drove_alone        shopping           NaN
2  drove_alone            work     no_travel
3  shared_ride            work           NaN
  mode_confirm purpose_confirm replaced_mode
0  pilot_ebike   entertainment   drove_alone
1  drove_alone             NaN           NaN
2  pilot_ebike            home   drove_alone
  mode_confirm purpose_confirm replaced_mode
0  pilot_ebike   entertainment   drove_alone
1  drove_alone             NaN           NaN
2  pilot_ebike            home   drove_alone

corinne-hcr · 2021-02-21T04:56:52Z

At this point, I am going to get v-measure for original user_input, after changing language, and converting purposes and mode, on all bins, bin above cutoff, and clusters above cutoff

@shankari suggested

we should then move on to the number of user requests per month as the other part of the tradeoff
and to the second task to see if we can bump up the accuracies. As part of that, I really want to try out other clustering techniques as well such as DBSCAN

as you are working on the binning, I would suggest making a wrapper around the bin algorithm that is consistent with the sklearn interfaces

that will make it easy for you to experiment with multiple algorithms using sklearn (https://scikit-learn.org/stable/modules/model_evaluation.html)
e.g. https://scikit-learn.org/stable/modules/grid_search.html#grid-search

sklearn has a specific interface that it expects (e.g. https://scikit-learn.org/stable/developers/develop.html). If you implement that interface, you can use the built-in GridSearch (https://scikit-learn.org/stable/modules/grid_search.html#exhaustive-grid-search) without having to call the functions directly

one caveat is that I am not sure whether GridSearchCV works well for clustering (as opposed to classification) algorithms (https://stackoverflow.com/questions/25633383/how-can-gridsearchcv-be-used-for-clustering-meanshift-or-dbscan)

shankari · 2021-02-21T05:11:05Z

For user[4], there are some NaN in both purpose_confirm and replaced_mode, the score is lower.

I don't recall seeing any NaN in the purpose entries.

I double-checked the entries using
https://github.com/e-mission/em-public-dashboard/blob/main/viz_scripts/mode_purpose_share.ipynb

I suspect this may be because of your mapping code. But you can try the code from the public dashboard to see if you still have N/A entries and if not, debug the differences between that code and yours.

>>> expanded_ct["purpose_confirm"].unique()

array(['meal', 'personal_med', 'transit_transfer', 'shopping', 'home',
       'exercise', 'work', 'entertainment', 'pick_drop', 'school',
       ...], dtype=object)

corinne-hcr · 2021-02-21T05:23:48Z

I didn't convert anything from this user, the output is from non_empty_trips.
For this user, total 453 trips, but only 33 trips have user_input, after filtering out trips that are points, 28 left. And that is from all data not above cutoff. Do you really have a different output from mine?
For example, 'user_input': {'mode_confirm': 'shared_ride'}}}), I guess the user didn't input values


mode_confirm | purpose_confirm | replaced_mode

pilot_ebike | entertainment | drove_alone
pilot_ebike | entertainment | drove_alone
shared_ride | NaN | NaN
drove_alone | NaN | NaN
drove_alone | NaN | NaN
pilot_ebike | NaN | NaN
drove_alone | NaN | NaN
shared_ride | NaN | NaN
pilot_ebike | NaN | NaN
drove_alone | NaN | NaN
drove_alone | NaN | NaN
pilot_ebike | exercise | NaN
drove_alone | NaN | NaN
drove_alone | work | no_travel
drove_alone | NaN | no_travel
drove_alone | shopping | NaN
drove_alone | exercise | no_travel
drove_alone | shopping | NaN
drove_alone | work | no_travel
drove_alone | exercise | no_travel
drove_alone | NaN | NaN
drove_alone | NaN | NaN
pilot_ebike | home | drove_alone
pilot_ebike | home | drove_alone
shared_ride | work | NaN
drove_alone | entertainment | NaN
drove_alone | work | NaN
drove_alone | work | no_travel

corinne-hcr · 2021-02-21T05:33:04Z

So, should I map same_mode to the mode in mode_confirm?

shankari · 2021-02-21T06:21:38Z

@corinne-hcr I am not sure which user you are talking about; the order that you and I have may not be the same. I don't think we should share UUIDs in this public forum, and by design, the public dashboard only performs aggregate analysis.

But I can confirm that I don't see any N/A entries for purpose_confirm across all participants. Are you sure you are filtering the users to only look at participants?

Again, it would be helpful to commit your code here. My code is checked in and you can verify against it any time.

shankari · 2021-02-21T07:42:55Z

tour_model_eval/v-measurel_cutoff_bins_single_user.ipynb

+    "for i in range(len(bins)):\n",
+    "    for trip in bins[i]:\n",
+    "        labels_pred.append(i)\n",
+    "labels_pred"


Although the actual labels can be anything (1,1,1,1 == 5,5,5,5), I am pretty sure that the metric expects the labels_true and labels_pred to be in the same order - e.g. if labels_true is generated from trips [t1, t2, t3, t4] then labels_pred is generated from the same trips in the same order.

Are you sure that your implementation here matches that?

While generating labels_true, you are iterating the trips in bin_trips order
While generating labels_pred, you are iterating on a bin by bin order

So it would seem that if the bins are (t2, t3) and (t1, t4)
then the labels_true would be in order [t1, t2, t3, t4]
and the labels_pred would be in order [t2, t3, t1, t4]

Maybe I don't understand fully recall the structure of bin_trips. But to be on the safe side, can you add verification to the notebook to show that the trips are actually being processed in the same order?

I think they are in same order.

You can see the bin_df

That's the trips order after binning.
It is according to the non_empty_tripsindices in the bins

The following is the list from where I collect labels_true, it is indeed according to the indices in bin_trips

I do consider the trip order. I think if bin_trips_user_input_ls has the same trip order as trips in bin_df, then the way I collect labels should be the same. Although I just show part of the trips, but you can see from the picture, the trips are in same order.

Is that clear?

I still don't see this.

bin_user_input that you have listed above uses bins not bin_trips, the same as labels_pred.
I think that your main argument is that the labels are the same through spot checking. That is good, but it could also happen by chance and for the first few entries. I certainly did not look at every single entry, and I am not sure I would be very accurate at doing so. This is the kind of checking which it is much better for computers to do programatically.

I would feel a lot more confident by comparing the trip indices in the original list directly in code. Add an index to the original list (if it doesn't have one) and use it as the dataframe index for both labels_pred and labels_true and verify that the indices are the same.

If you don't want to add the index, use the timestamp, which is also guaranteed to be unique.

But use a unique value in the input, pass it through the pipeline and confirm that it is the same.

The red list of bins above contains the original indices from trips after Naomi filtered out trips that are points. bin_trips is a new data list that collects trips according to the indices in bins.
Which key word is for timestamp that you want me to add?

trip["data"]["start_ts"] should work as a timestamp. Again, I don't want you to print it, I want you to check it programmatically

Is that ok?

That looks good, but is not using the full power of pandas/numpy. Nevertheless, please add to PR.

shankari · 2021-02-21T15:39:33Z

So, should I map same_mode to the mode in mode_confirm?

Sure

shankari · 2021-02-21T15:43:28Z

Do you really have a different output from mine?

But I can confirm that I don't see any N/A entries for purpose_confirm across all participants.

Did you investigate and determine the reason for the discrepancy?

corinne-hcr · 2021-02-21T21:01:31Z

The following is my investigation.
I copy your scaffoldingcode, then choose the specific uuid which I want to look at

In my file, the trips are from 2020/11 to 2021/1
So, I screenshot those three months labeled trips from yourmode_purpose_share to compare

2020/11

2020/12

2021/1

Here is non_empty_trips after binning from my file

  mode_confirm purpose_confirm replaced_mode
0  shared_ride             NaN           NaN
1  drove_alone        shopping           NaN
2  drove_alone            work     no_travel
3  shared_ride            work           NaN
  mode_confirm purpose_confirm replaced_mode
0  pilot_ebike   entertainment   drove_alone
1  drove_alone             NaN           NaN
2  pilot_ebike            home   drove_alone
  mode_confirm purpose_confirm replaced_mode
0  pilot_ebike   entertainment   drove_alone
1  drove_alone             NaN           NaN
2  pilot_ebike            home   drove_alone

So, you can see, the total labeled trips from your code are the same from my code. I also got NaN output from your code. I think that's because the user didn't input value.

shankari · 2021-02-22T00:36:26Z

can you put the related UUID into the teams chat? As I said earlier, I don't see NaN as a value in purpose_confirm when I run it

expanded_ct["purpose_confirm"].unique()

does not include NaN

shankari · 2021-02-22T00:38:42Z

>>> expanded_ct["purpose_confirm"].isnull()
106     False
108     False
109     False
111     False
113     False
        ...  
4105    False
4106    False
4107    False
4108    False
4109    False
Name: purpose_confirm, Length: 2425, dtype: bool

corinne-hcr · 2021-02-22T00:39:58Z

I think you should pick 2020/12 or 2021/1

shankari · 2021-02-22T00:42:03Z

I'm not sure what you mean by that. I still need to know the user so that the indices match up, right?
Otherwise, if I look at entry 599 from your screenshots above (which is missing some values), I get

>>> expanded_ct.iloc[599]
....
mode_confirm                                                     pilot_ebike
purpose_confirm                                                         work
replaced_mode                                                           bike

corinne-hcr · 2021-02-22T00:46:40Z

I have already put the uuid in Teams. In 2020/11, there is no NaN for this user, but in 2020/12 and 2021/1, you can see NaN

shankari · 2021-02-22T00:47:32Z

nvm, you are right!

Instead of manually looking at the isnull() output, I checked it using code, and there are some missing entries, though they don't match your indices above.

>>> pc = expanded_ct["purpose_confirm"].isnull()
>>> pc[pc == True]
1919    True
2095    True
2129    True
2131    True
2132    True
2136    True
2800    True
2805    True
2807    True
2809    True
2811    True
2813    True
2914    True
2916    True
2918    True
2934    True
2940    True
2945    True
3039    True
3041    True
3899    True
3909    True

shankari · 2021-02-22T00:51:21Z

And double checking those entries from the dataframe does indicate that they are missing.

pc = expanded_ct["purpose_confirm"].isnull()
expanded_ct.loc[pc[pc == True].index][["mode_confirm", "purpose_confirm", "replaced_mode"]]

mode_confirm	purpose_confirm	replaced_mode
pilot_ebike	NaN	NaN
pilot_ebike	NaN	NaN
pilot_ebike	NaN	NaN
pilot_ebike	NaN	NaN
pilot_ebike	NaN	NaN
pilot_ebike	NaN	taxi
shared_ride	NaN	NaN
drove_alone	NaN	NaN
drove_alone	NaN	NaN
pilot_ebike	NaN	NaN
drove_alone	NaN	NaN
shared_ride	NaN	NaN
pilot_ebike	NaN	NaN
drove_alone	NaN	NaN
drove_alone	NaN	NaN
drove_alone	NaN	NaN
drove_alone	NaN	NaN
drove_alone	NaN	no_travel
drove_alone	NaN	NaN
drove_alone	NaN	NaN
not_a_trip	NaN	no_travel
not_a_trip	NaN	no_travel

shankari · 2021-02-22T00:54:07Z

I think we should skip these entries for now. There are not many of them, and the user has only a few labeled trips to begin with. So I wasn't counting him in my mental training set anyway.

You can just skip all users with < 50% labeling rate for this training and tuning stage.

corinne-hcr · 2021-02-22T00:59:39Z

Btw, this is the way how Naomi collected bin_trips above cutoff bins

    #delete lower portion of bins
    def delete_bins(self):
        self.calc_cutoff_bins()
        for i in range(len(self.bins) - num):
            self.bins.pop()
        newdata = []
        for bin in self.bins:
            for b in bin:
                d = self.data[b]
                newdata.append(self.data[b])
        self.newdata = newdata if len(newdata) > 1 else self.data

bin_trips is self.newdata, and it is exactly according to the indices from bins.
And the indices in bins

Here is how the she collected indices in bins

        logging.debug('After removing trips that are points, there are %s data points' % len(self.data))
        self.size = len(self.data)

    #create bins
    def bin_data(self):
        for a in range(self.size):
            added = False
            for bin in self.bins:
                try:
                    if self.match(a,bin):
                        bin.append(a)
                        added = True
                        break
                except:
                    added = False
            if not added:
                self.bins.append([a])
        self.bins.sort(key=lambda bin: len(bin), reverse=True)

Here a is the original index of the trip after she filtered out trips that are points. In delete_bins(self) , she used b instead of a to represent the index.
So, in algorithm, the trips order in bins is exactly the same as trips order in bin_trips

corinne-hcr · 2021-02-22T01:02:46Z

You can just skip all users with < 50% labeling rate for this training and tuning stage.

Do you mean among all users labeled trips, I should skip users that have <50% full labeled trips in the three columns ?

shankari · 2021-02-22T03:50:19Z

Do you mean among all users labeled trips, I should skip users that have <50% full labeled trips in the three columns ?

Correct. You will end up with ~ 6 users I think.

shankari · 2021-02-22T04:23:25Z

We can add 4 other users from another project later; their confirmed labels will be a 2-tuple of mode_confirm and purpose_confirm only. So ~ 10 users total but for multiple months each

shankari

Can you please make sure to remove outputs before committing? I am finding this hard to review.

shankari · 2021-03-06T14:39:12Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+    non_empty_trips = [t for t in trips if t["data"]["user_input"] != {}]
+    valid_trips = [t for t in non_empty_trips if 'mode_confirm' in t["data"]["user_input"] and
+                   'purpose_confirm' in t["data"]["user_input"] and 'replaced_mode' in t["data"]["user_input"]]


This is combining valid trips and binning. In the next PR, I would pull out the valid trip filtering into a separate function so it can be reused across the various modules - v-score calculation, query calculation, future clustering, etc.

also, is there a reason you are not using dropna here instead of hardcoding the entry checks?

shankari · 2021-03-06T14:49:06Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+        if len(filter_trips) < 10:
+            homo_score.append(NaN)
+            comp_score.append(NaN)
+            v_score.append(NaN)
+            continue


Again, as we discussed, please combine this with the other validity checks.

shankari · 2021-03-06T14:49:45Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+            if len(bin_trips) < 10:
+                homo_score.append(NaN)
+                comp_score.append(NaN)
+                v_score.append(NaN)
+                continue


shankari · 2021-03-06T14:50:18Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+        if len(filter_trips) < 10:
+            homo_score.append(NaN)
+            comp_score.append(NaN)
+            v_score.append(NaN)
+            continue


shankari · 2021-03-06T14:50:26Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+        if len(bin_trips) < 10:
+            homo_score.append(NaN)
+            comp_score.append(NaN)
+            v_score.append(NaN)
+            continue
+


shankari · 2021-03-06T14:52:43Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+                labels_true.append(no_dup_list.index(trip))
+        labels_pred = feat.labels
+
+        # compare the points in cluster_trips and those in feat.points, return nothing if two data frames are the same


can you fix the comment here? Right now if the frames are the same, we just go on to the score calculation

…e,user input request proportion median)

shankari

What is the difference between the various notebooks here?
Can you add a description at the top of each notebook indicating what it does.

tour_model_eval/confirmed_trips_eval_bins_clusters.py

shankari

I would split the code into multiple functions so that they can be composed into a pipeline.

the functions would be:

get_valid_trips
map_labels (sp2en or cvt_pur_mo)
valid_user_check
fit_clusters (with two implementations, bin and cluster)
get_pred_labels (two implementations?)
get_user_labels (two implementations?)
verify_clusters (with two implementations)
compute scores

Then you have two pipelines, one with the bin implementations and the other with the cluster implementations, calling one after the other without any intermediate steps or repetition.

tour_model_eval/confirmed_trips_eval_bins_clusters.py

shankari · 2021-03-24T07:53:12Z

On a second look, the current pipelines are not too bad since several of the steps will have separate implementations anyway. I will accept a simpler refactoring in which you pull out the common functionality map_labels, valid_user_check and compute scores into separate functions which you call from the two separate functions.

If you want to implement the cleaner, but more complex refactoring, I won't complain!

corinne-hcr · 2021-03-24T08:04:37Z

I can pull out map_labels, but valid_user_check is already a single function, it actually based on filter_trips and trips, which are from filter_data(another separate function). Because I need different return values for the following steps from these two functions, I don't make them into one function.

And what do you want me to do with compute score? What should it include?

shankari · 2021-03-24T08:24:52Z

valid_user_check is already a single function,

valid_user_check is not a single function, valid_user is.

Because I need different return values for the following steps from these two functions, I don't make them into one function.

Not sure what this means. The code that uses valid_user appears to be identical in both cases. Can you point out where it is different?


        if not valid_user(filter_trips,trips):
            homo_score.append(NaN)
            comp_score.append(NaN)
            v_score.append(NaN)
            continue

tour_model_eval/confirmed_trips_eval_bins_clusters.py

shankari · 2021-03-24T08:45:02Z

I was hoping you would finish the refactoring and add the descriptions for the other notebooks first so I could merge this.
You could then submit a new PR for the second round clustering so I can review your code properly instead of being relegated to view it over screenshare. I cannot review code when somebody else is navigating it remotely.

corinne-hcr · 2021-03-27T04:53:57Z

For the scatter, the one I have is based on the bins above cutoff. The trips are grouped only based on coordinates and radius. Back then we compared the difference between bins and clusters(min_clusters = 0) and got that the bins are better than clusters. But when we decided to use a second round clustering, I changed to use k-means for the first round clustering and set min_clusters = len(bins). So, now should I just leave like that? Later when I have result from hierarchical clustering, I write another code for the improved scatter?

shankari · 2021-03-27T15:59:04Z

@corinne-hcr

Not sure that you need to change to kmeans for the first round. You could use bins for the first round and agglomerative clustering for the second round.
IIRC, the results for the bins above cutoff and clusters above cutoff were not that different. So I don't think it matters a lot which you use for the first round.

not sure about your question - are you asking whether you need to go back and change the results for the trip end only clusters? In general, for any results, you don't have to present only one set of results. You can present a set of results and pick the best one to explore further.

…tion

shankari

@corinne-hcr please get this PR to a reviewable state before continuing. It is very hard to review large chunks of code with no context. I'm sure that the code is clear to you, but I'd like (at least this time) for the code to be relatively clear to others as well.

That means digestible code chunks with lots of comments explaining what you are doing.
Please get this done before you commit your refactored code.

corinne-hcr · 2021-04-06T06:08:35Z

I add some comments. Please let me know if it is still not digestible.

shankari · 2021-04-07T06:35:44Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+def valid_user_check(filter_trips,trips,homo_score,comp_score,v_score):
+    if not valid_user(filter_trips, trips):
+        homo_score.append(NaN)
+        comp_score.append(NaN)
+        v_score.append(NaN)
+        skip = True
+    else:
+        skip = False
+    return homo_score,comp_score,v_score,skip


I'm not going to insist on this since you are planning to refactor anyway, but you don't need to have homo_score, comp_score and v_score as both input and output. You can create a class that encapsulates all the scores and just pass it in directly.

Also, I really don't think you need to have append code in here - you can just have this function return valid or invalid. If invalid, append the values and continue; if valid, proceed with computation.

Please file an issue to clean this up later.

tour_model_eval/user_input_request_times_all_users.ipynb

shankari · 2021-04-09T10:12:46Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+    return bin_date
+
+
+# compare the trip orders in bin_trips with those in filter_trips above cutoff


Improve comment. What is "trip_orders"? why does it matter?
Just repeating the function name in the comment doesn't add much

shankari · 2021-04-09T10:14:33Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+        for bin in bin_date:
+            if day:
+                if match_day(trip,bin,filter_trips):
+                    bin.append(trip_index)
+                    added = True
+                    break
+            if month:
+                if match_month(trip,bin,filter_trips):
+                    bin.append(trip_index)
+                    added = True
+                    break
+


It seems like there should be a better way to implement this in pandas using groupby. Please file an issue for this so you can explore how to clean it up later.

shankari · 2021-04-09T10:49:03Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+    return homo_score,comp_score,v_score
+
+
+# This function is to compare a trip with a group of trips to see if they happened in a same day


what is the meaning of the parameters?
you have a bin and filter_trips. Why do you need to pass in both?
why not just pass in two trips?

shankari · 2021-04-09T10:51:42Z

tour_model_eval/user_input_request_times_all_users.ipynb

+    "    bin_day = evaluation.bin_date(req_trips_ls,filter_trips,day=True)\n",
+    "    req_day_ls = []\n",
+    "    for bin in bin_day:\n",
+    "        req_day_ls.append(len(bin))\n",
+    "               \n",


Please add a comment here that describes what bin_day is expected to look like to make the rest of this easier to review

shankari · 2021-04-09T10:55:05Z

tour_model_eval/confirmed_trips_eval_bins_clusters.py

+        trip = filter_trips[trip_index]
+
+        for bin in bin_date:
+            if day:
+                if match_day(trip,bin,filter_trips):
+                    bin.append(trip_index)
+                    added = True
+                    break
+            if month:
+                if match_month(trip,bin,filter_trips):
+                    bin.append(trip_index)
+                    added = True
+                    break
+
+        if not added:
+            bin_date.append([trip_index])
+


This code is just so messy and non-pythonic

Please add this to the list of areas to improve the implementation.

shankari · 2021-04-09T11:02:26Z

tour_model_eval/user_input_request_times_all_users.ipynb

+    "            if evaluation.match_day(req_trip,valid_trips_bin,filter_trips):\n",
+    "                proportion = round(len(req_trips_bin)/len(valid_trips_bin), 2)\n",
+    "                propor_single_user.append(proportion)\n",
+    "                match = True\n",
+    "                break\n",


This is the problem with the current match_day implementation. Reading this appears that you compare one trip to a list to find a match, and then you use it determine the proportion for that day for that user. But how is comparing only the first trip OK? What is the valid_trips_bin?

Needs more comments here. It took me 10 mins to toggle between the two implementations to figure this out, and that is too much time to spend on what is supposed to be fairly simple code.

corinne-hcr added 2 commits February 20, 2021 12:02

example code from sklearn

d1ff495

clean up output

ffedea5

v-measure for single user on above cutoff bins

ba0df38

shankari requested changes Feb 21, 2021

View reviewed changes

plot all data for all valid users

84d6404

shankari requested changes Mar 6, 2021

View reviewed changes

modified evaluation code, got subplots for query times

11f018b

corinne-hcr mentioned this pull request Mar 9, 2021

Evaluate the current clustering based on start/end place alone e-mission/e-mission-docs#605

Open

7 tasks

change'query times' to 'user input request times', add scatter(v-scor…

1be3341

…e,user input request proportion median)

shankari reviewed Mar 24, 2021

View reviewed changes

tour_model_eval/confirmed_trips_eval_bins_clusters.py Show resolved Hide resolved

tour_model_eval/confirmed_trips_eval_bins_clusters.py Show resolved Hide resolved

tour_model_eval/confirmed_trips_eval_bins_clusters.py Show resolved Hide resolved

shankari requested changes Mar 24, 2021

View reviewed changes

tour_model_eval/confirmed_trips_eval_bins_clusters.py Outdated Show resolved Hide resolved

shankari reviewed Mar 24, 2021

View reviewed changes

tour_model_eval/confirmed_trips_eval_bins_clusters.py Outdated Show resolved Hide resolved

corinne-hcr added 2 commits March 27, 2021 17:38

modified evaluation code, changed to use homogeneity score for evalua…

302c681

…tion

modified evaluation code and added comments in the notebooks

0e7f803

shankari requested changes Apr 6, 2021

View reviewed changes

add more comments in evaluation code

8db9a72

shankari reviewed Apr 7, 2021

View reviewed changes

shankari requested changes Apr 7, 2021

View reviewed changes

tour_model_eval/user_input_request_times_all_users.ipynb Outdated Show resolved Hide resolved

tour_model_eval/user_input_request_times_all_users.ipynb Show resolved Hide resolved

corinne-hcr added 2 commits April 7, 2021 10:32

add more functions in the evaluation code

56049d2

add find_first_trip function using start_ts

644c2dd

shankari reviewed Apr 9, 2021

View reviewed changes

shankari mentioned this pull request Apr 13, 2021

Cleanup clustering code implementation e-mission/e-mission-docs#631

Closed

		return bin_date


		# compare the trip orders in bin_trips with those in filter_trips above cutoff

		return homo_score,comp_score,v_score


		# This function is to compare a trip with a group of trips to see if they happened in a same day

v-measure #19

Are you sure you want to change the base?

v-measure #19

Conversation

corinne-hcr commented Feb 21, 2021

shankari commented Feb 21, 2021 • edited Loading

corinne-hcr commented Feb 21, 2021

corinne-hcr commented Feb 21, 2021

shankari commented Feb 21, 2021

shankari commented Feb 21, 2021

corinne-hcr commented Feb 21, 2021

corinne-hcr commented Feb 21, 2021

shankari commented Feb 21, 2021

corinne-hcr commented Feb 21, 2021 • edited Loading

corinne-hcr commented Feb 21, 2021

shankari commented Feb 21, 2021 • edited Loading

shankari Feb 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shankari Feb 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shankari Feb 22, 2021 • edited Loading

Choose a reason for hiding this comment

shankari commented Feb 21, 2021

shankari commented Feb 21, 2021

corinne-hcr commented Feb 21, 2021

shankari commented Feb 22, 2021

shankari commented Feb 22, 2021

corinne-hcr commented Feb 22, 2021

shankari commented Feb 22, 2021 • edited Loading

corinne-hcr commented Feb 22, 2021

shankari commented Feb 22, 2021

shankari commented Feb 22, 2021

shankari commented Feb 22, 2021

corinne-hcr commented Feb 22, 2021 • edited Loading

corinne-hcr commented Feb 22, 2021

shankari commented Feb 22, 2021

shankari commented Feb 22, 2021 • edited Loading

shankari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shankari left a comment

Choose a reason for hiding this comment

shankari left a comment

Choose a reason for hiding this comment

shankari commented Mar 24, 2021 • edited Loading

corinne-hcr commented Mar 24, 2021 • edited Loading

shankari commented Mar 24, 2021

shankari commented Mar 24, 2021 • edited Loading

corinne-hcr commented Mar 27, 2021

shankari commented Mar 27, 2021

shankari left a comment

Choose a reason for hiding this comment

corinne-hcr commented Apr 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shankari Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shankari Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

shankari commented Feb 21, 2021 •

edited

Loading

corinne-hcr commented Feb 21, 2021 •

edited

Loading

shankari commented Feb 21, 2021 •

edited

Loading

shankari Feb 21, 2021 •

edited

Loading

shankari Feb 22, 2021 •

edited

Loading

shankari Feb 22, 2021 •

edited

Loading

shankari commented Feb 22, 2021 •

edited

Loading

corinne-hcr commented Feb 22, 2021 •

edited

Loading

shankari commented Feb 22, 2021 •

edited

Loading

shankari commented Mar 24, 2021 •

edited

Loading

corinne-hcr commented Mar 24, 2021 •

edited

Loading

shankari commented Mar 24, 2021 •

edited

Loading

shankari Apr 9, 2021 •

edited

Loading

shankari Apr 9, 2021 •

edited

Loading