Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API file by file #10

Merged
merged 7 commits into from Jun 12, 2015
Merged

API file by file #10

merged 7 commits into from Jun 12, 2015

Conversation

oorestisime
Copy link
Contributor

This PR is about the file-by-file API. It implements:

  • checksum search
  • filename & package search
  • suite filter ( name or latest) for both searches

Compared to the api doc there are two differences:

url for file name & package search is slightly different using
/copyright/api/file/path/?package=--
instead of
/copyright/api/file/?path=--&package=--
as it is easier and better to handle paths

The json output for filename and package search differs as if the user doesn't specify a suite name the results can be > 1 with different checksums. The solution was to include everything in a result field. I didn't update the doc for this one as you might have a better solution to come up with.

@oorestisime oorestisime force-pushed the api_file-by-file branch 2 times, most recently from ada84a1 to 6f4b5bb Compare June 10, 2015 09:54
@matthieucan
Copy link
Member

On 10 June 2015 at 11:52, Orestis Ioannou notifications@github.com wrote:

checksum search
filename & package search
suite filter ( name or latest) for both searches

Thanks for the explanation along the PR, it's greatly appreciated. To
do even better next time: it'd be awesome to have a list of routes the
PR implements, e.g. /copyright/api/foo for each item :)

Compared to the api doc there are two differences:

url for file name & package search is slightly different using
/copyright/api/file/path/?package=--
instead of
/copyright/api/file/?path=--&package=--
as it is easier and better to handle paths

No problem, perfect that you updated the doc along.

The json output for filename and package search differs as if the user doesn't specify a suite name the results can be > 1 with different checksums. The solution was to include everything in a result field. I didn't update the doc for this one as you might have a better solution to come up with.

Hm, that's a bit problematic. It would be better, for programs which
will use our API, to always return the same kind of structure, so that
results/errors/parameters are always accessed the same way. I didn't
see that coming with the API specs, but I think it needs some
re-design: something like

{
  return_code: 200/404/etc,
  results:
  [
    {
    meta-information: foo,
    copyright: {...}
    },
    {...}
  ],
  eventual parameters
}

Would this address all the types of queries?

Zack, what do you think?

For the rest, here are a few notes:

  • The file l_helper.py doesn't have a clear name to me, as I couldn't
    guess what it does before reading its content. Any idea? :)
  • In this same file, you use the string 'exception' to pass around the
    message the problem has occured. Why not use try/except's?
  • It would be nice to have appropriate HTTP errors when the API
    returns an error such as "Package required", which should better be
    displayed in an error field. Also, this needs to be documented so that
    API users have the error handling in mind.
  • I don't understand why 'package' is passed as an optional field. Why
    not use /api/file//?

Besides this, the API works fine on my machine. And kudos for all the
tests you implemented!

@oorestisime
Copy link
Contributor Author

Thanks for the quick review :)

On 06/10/2015 08:54 PM, Matthieu Caneill wrote:

On 10 June 2015 at 11:52, Orestis Ioannou notifications@github.com wrote:

checksum search
filename & package search
suite filter ( name or latest) for both searches

Thanks for the explanation along the PR, it's greatly appreciated. To
do even better next time: it'd be awesome to have a list of routes the
PR implements, e.g. /copyright/api/foo for each item :)

Ok works i ll think of that next time :)

[snip]

Hm, that's a bit problematic. It would be better, for programs which
will use our API, to always return the same kind of structure, so that
results/errors/parameters are always accessed the same way. I didn't
see that coming with the API specs, but I think it needs some
re-design: something like

{
return_code: 200/404/etc,
results:
[
{
meta-information: foo,
copyright: {...}
},
{...}
],
eventual parameters
}

Would this address all the types of queries?

IMHO that should be good for all the queries.

Zack, what do you think?

For the rest, here are a few notes:

  • The file l_helper.py doesn't have a clear name to me, as I couldn't
    guess what it does before reading its content. Any idea? :)

license_helper.py, copyright.py

  • In this same file, you use the string 'exception' to pass around the
    message the problem has occured. Why not use try/except's?
  • It would be nice to have appropriate HTTP errors when the API
    returns an error such as "Package required", which should better be
    displayed in an error field. Also, this needs to be documented so that
    API users have the error handling in mind.

ack

  • I don't understand why 'package' is passed as an optional field. Why
    not use /api/file//?

Well yes you are right it's better like that. We can also have
/package/version/ and then we will have the same schema as for
the sources api. Doing so will restrict the results to 0 or 1 and then
the json structure we formed in the doc would be valid as well. OTOH a
user won't be able to see the evolution of the license of the same file
under different versions.

What do you think? Add version as well or not?

Cheers

@zacchiro
Copy link
Member

On Wed, Jun 10, 2015 at 11:54:20AM -0700, Matthieu Caneill wrote:

The json output for filename and package search differs as if the
user doesn't specify a suite name the results can be > 1 with
different checksums. The solution was to include everything in a
result field. I didn't update the doc for this one as you might have
a better solution to come up with.

Hm, that's a bit problematic. It would be better, for programs which
will use our API, to always return the same kind of structure, so that
results/errors/parameters are always accessed the same way. I didn't
see that coming with the API specs, but I think it needs some
re-design: something like

{
  return_code: 200/404/etc,
  results:
  [
    {
    meta-information: foo,
    copyright: {...}
    },
    {...}
  ],
  eventual parameters
}

Would this address all the types of queries?

Zack, what do you think?

I'm not sure I'm reading correctly your example above, but I agree with
you that the results should be uniform. If the lack of uniformity comes
only from the fact that in some case we have only 1 checksum to treat,
and in others more than one, the solution would be to always "group" the
results "by" checksums. And the same technique could be applied to any
other information on which we can "group by".

If your meta-information above is meant to be a "group by" key, then we
are certainly in agreement.

Cheers.

Stefano Zacchiroli . . . . . . . zack@upsilon.cc . . . . o . . . o . o
Maître de conférences . . . . . http://upsilon.cc/zack . . . o . . . o o
Former Debian Project Leader . . @zack on identi.ca . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

@oorestisime
Copy link
Contributor Author

So i have reworked most of the things (thats why i didn't keep old commits).. Changes:

  • Better handling of exceptions as suggested. Now when there are no d/copyright files in a package it throws a 404 suggestions with versions of the package that have such a file.
  • taking this into account i fixed the way i create the copyright dictionnary. If a package hadn't a d/copyright file the previous version was throwing a 404 version. No i put None as license. Like if the license was not parsable. Do you think it is necessary to add a an extra field when the license is none to be more expressive about the error? I.e saying license missing or license not parsable
  • changed routes to have /package/
  • grouped everything by checksums
  • added return code and error when needed (i.e file or checksum not found)

Here are some routes:
/copyright/api/sha256/?checksum=2e6d31a5983a91251bfae5aefa1c0a19d8ba3cf601d0e8a706b4cfa9661a6b8a
&package=gnubg and/or &suite=latest etc

/copyright/api/gnubg/doc/gnubg/gnubg.html/
?suite=latest etc

Missing:

  • I need to update the doc if you find the json structure ok
  • i removed 'file' from /api/file/package// . I am not sure if we leave like that or maybe add path in the place of file.

@jpleau
Copy link
Contributor

jpleau commented Jun 11, 2015

Hi. The API works great for me here ! I haven't looked at the code yet though.

Something I'm not sure about are the routes.

(Question, I couldn't test it as I wasn't sure how to add a package: What happens if we have a package named 'sha256' ? Since we removed the /[file] prefix, this could clash with /api/sha256)

We now have 3 type of routes for the different APIs we have in Debsources:

  • In sources: /[package]/[version|suite]/[path]
  • Copyright by file: /[package]/[path]/[?suite=]
  • Copyright by checksum: /[sha256]/[?package=&suite=]

(Another subject of discussion, I mentionned it on IRC, the sources api begins with /api/src/, while the copyright api goes with /copyright/api/, we should either go with /api/[component] or /[component]/api, not both)

Since Debsources might host other components in the future, I think it's important to have one standard.

I'm not certain how we can achieve this though.

Proposal (not quite satisfied with this yet, it's up for debate!):

Copyright by file: /file/[package]/[version|suite|all]/[path] -- this would bring back /file that you removed though!

Copyright by checksum: /sha256/[package|all]/[version|suite|all]/checksum

Let me know what you think :)

@oorestisime
Copy link
Contributor Author

Thanks for the review :)

On 06/11/2015 03:31 AM, Jason Pleau wrote:

Hi. The API works great for me here ! I haven't looked at the code yet
though.

Something I'm not sure about are the routes.

(Question, I couldn't test it as I wasn't sure how to add a package:
What happens if we have a package named 'sha256' ? Since we removed the
/[file] prefix, this could clash with /api/sha256)

hm, no idea. I think it will search inside the sha256 package since
there will be a path after that and not a ?checksum. but i get what you
imply :p

We now have 3 type of routes for the different APIs we have in Debsources:

  • In sources: /[package]/[version|suite]/[path]
  • Copyright by file: /[package]/[path]/[?suite=]
  • Copyright by checksum: /[sha256]/[?package=&suite=]

(Another subject of discussion, I mentionned it on IRC, the sources api
begins with /api/src/, while the copyright api goes with
/copyright/api/, we should either go with /api/ or //api, not both)

well the /copyright is the prefix of the blueprint we can't do much
here. In s.d.n it is obligatory but if we were to deploy a c.d.n
/copyright wouldn't be there.

Since Debsources might host other components in the future, I think it's
important to have one standard.

I'm not certain how we can achieve this though.

Proposal (not quite satisfied with this yet, it's up for debate!):

Copyright by file: /file/[package]/[version|suite|all]/path
<this%20would%20bring%20back%20/file%20that%20you%20removed%20though!>

I like that. A user on a file on s.d.n will just have to add the
copyright prefix to get the license of a file. An i forgot the all keyword.
Seems good to me. Matthieu? Zack?

Copyright by checksum: /sha256/[package|all]/[version|suite|all]/checksum

I prefer the way it is since it is the same with the sources api ( i.e
s.d.n/api/sha256/?checksum=d77d235e41d54594865151f4751e835c5a82322b0e87ace266567c3391a4b912
)

Cheers!

@oorestisime
Copy link
Contributor Author

I reintroduced the /file/ suffix in the api as without it i found some errors. I introduced the url schema proposed by jpleau. new routes now are /api/file/package/version/path with version being:

  • latest | introduces a redirection
  • suite alias | introduces a redirection
  • version code
  • all

I had to modify the version handling to achieve this (minor diffs, passing the endpoint as a parameter) hence the new commit.

@matthieucan
Copy link
Member

On 11 June 2015 at 10:57, Orestis Ioannou notifications@github.com wrote:

I reintroduced the /file/ suffix in the api as without it i found some errors. I introduced the url schema proposed by jpleau. new routes now are /api/file/package/version/path with version being:

Yes, that's great, I wanted to ask you that. Better not to have any
possible conflicts in the routes!

All the rest looks super good, but I couldn't test it yet. I'll keep
you in touch!

@matthieucan matthieucan merged commit bd5c35f into Debian:master Jun 12, 2015
@oorestisime oorestisime deleted the api_file-by-file branch June 12, 2015 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants