Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output the testing results of many images as text file #70

Open
tmquan opened this issue Apr 15, 2015 · 10 comments
Open

Output the testing results of many images as text file #70

tmquan opened this issue Apr 15, 2015 · 10 comments

Comments

@tmquan
Copy link

tmquan commented Apr 15, 2015

Thanks to digits, the new feature of "image_classification_model_classify_many" works well.

I just wonder is there any way to retrieve that result as text file from this part, to systematically analyze the output using pandas? It would be nice if this can save the dataframe of testing images somewhere in its job's directory

classifications = []
    for image_index, index_list in enumerate(indices):
        result = []
        for i in index_list:
            # `i` is a category in labels and also an index into scores
            result.append((labels[i], round(100.0*scores[image_index, i],2)))
        classifications.append(result)
@VrUnRealEngine4
Copy link

Did you have a look at my code in #61... That is how I directly store the results to a csv file....

@tmquan
Copy link
Author

tmquan commented Apr 15, 2015

Thanks to michael-geoge-hart.

I did the same strategy to output the result to text file or csv, but it is really messy and I dont want to alter the beautiful structure of DIGITS.

Just wonder to have this feature in the next version because it is very convenient to analyse on the test dataset.

On the other hand, since the list of test data is oftent quite large (mine has around 1 milion images). It would be nice if we can have a status directly on the web browser, along side with the terminal info log.

Regards,

@VrUnRealEngine4
Copy link

Totally agree .... @lukeyeager has done a really good job on DIGITS in terms of clear though and design/implementation of DIGITS .... it is really not that hard to add the additional features you desired if you look at #61

@lukeyeager
Copy link
Member

@tmquan, so you'd like to be able to download the "Classify Many" results as a .csv? That sounds pretty doable. As I mentioned here, you should be able to copy and paste the text into excel and get your data fairly easily. But I agree that this would be a nice enhancement.

@michael-george-hart, you sound eager to help with this. If you could embed the data on the results page in the CSV format and provide a "Download Results" button that would be awesome. That would be much more user friendly than Tran's suggestion:

save the dataframe of testing images somewhere in its job's directory

@lukeyeager
Copy link
Member

On the other hand, since the list of test data is often quite large (mine has around 1 million images). It would be nice if we can have a status directly on the web browser, along side with the terminal info log.

Are you talking about returning intermediate results? Or some kind of progress bar? @Sravan2j, this sounds like the kind of thing you were talking about. I agree, that would be nice.

@VrUnRealEngine4
Copy link

I will take a shot at providing the requested feature.... However, I will be very busy until this Saturday. I will get a copy of the latest master then and see what I can do over the weekend.

Since I have been doing a great deal of runs batch runs of 10K to 20K images ... there is one things that would be particularly useful to me and perhaps others.... I often need know what was the best epoch is during training.... I checked the logs to see if I could discover what was the best performing epoch ... didn't see anything.... anyway most people would want to use the best epoch on their test datasets and eyeballing the graph to discover what the best epoch was after a few hundred epoch can be difficult ... so I think that would be useful information to have some place on the page or apart of the epoch list box showing not only the epoch but the performance of the particular epoch...

@lukeyeager
Copy link
Member

have some place on the page or apart of the epoch list box showing not only the epoch but the performance of the particular epoch

Great idea, why don't you open a new issue and suggest a new feature there. Long threads get confusing.

@tmquan
Copy link
Author

tmquan commented Apr 16, 2015

@lukeyeager : The reason I'd like to address this enhancement is when I pulled the newest version from master branch, it was fine to classify a million of images and return the results to the terminal. It was, however, the problem on the client browser side: can not display million lines of results and the browser was automatically killed.

Processed 1443682/1443682 images
2015-04-16 03:41:08 [ERROR] Exception on /models/images/classification/classify_many [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1478, in full_dispatch_request
    response = self.make_response(rv)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1566, in make_response
    raise ValueError('View function did not return a response')
ValueError: View function did not return a response

@lukeyeager
Copy link
Member

Oh wow, so it was a request timeout? Yuck.

I think the best way to solve this would be with the intermediate results / progress bar feature I mentioned earlier. Then the page can return quickly with a "0/1,300,000 images processed" page, and then slowly return the data as it comes through. That's a lot of work, and won't get implemented quickly. For now I'd suggest breaking up your huge textfile into manageable chunks.

@lukeyeager lukeyeager added the bug label Apr 20, 2015
@joyofdata
Copy link
Contributor

Or the CSV is stored in a public directory of the server and a link to it is provided. By means of a recurring AJAX call presence might be checked. Or, even simpler, the link is simply dead until the CSV is created - maybe the new predictions are appended then the link would allow for download of intermediate results. At the end of the day, those solutions would be simple to implement and already provide a convenient solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants