Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arbitrary URL rather than IP #14

Closed
LecrisUT opened this issue Feb 25, 2021 · 30 comments
Closed

Allow arbitrary URL rather than IP #14

LecrisUT opened this issue Feb 25, 2021 · 30 comments
Assignees
Labels
enhancement New feature or request

Comments

@LecrisUT
Copy link

Simple as that. On the server side you could serve some basic info of the server to check via json API if the location is a receipt-parser server or not.

@monolidth
Copy link
Member

monolidth commented Feb 25, 2021

Yes, great idea. I will implement this as soon as I have passed my exams. For now, free time does not exist.

Regards,
William

@monolidth monolidth self-assigned this Feb 25, 2021
@monolidth monolidth added the enhancement New feature or request label Feb 25, 2021
@monolidth
Copy link
Member

I start working on it. I now use the zeroconf protocol to fetch the device IP.

@monolidth
Copy link
Member

See commit.

@LecrisUT
Copy link
Author

I would suggest to expose a setting option to let the user choose the ip and address url to configure before using the hostname and ip. In a homelab, let the dns do it's work. And similarly for turning on or off ssl and using reverse proxy.

@monolidth
Copy link
Member

It is already possible to edit the server IP and server API token, however, it is not possible to edit the zeroconf type.

  • If you wish, I could add an edit option?
  • What is the advantage to use a reverse proxy instead of an SSL certificate, in this case?

@LecrisUT
Copy link
Author

I was refering to the address url, e.g. receipt.example.com, not the IP itself. Sorry for the confusion. The main use-case for reverse proxy is to have a central web-server which manages the SSL certificates, renewal etc. My software of choice is caddy. That way you can have multiple web applications on the same server, e.g. nextcloud, gitea etc.

The main use-case for all of this in particular is to have interoperability between receipt-parser and grocy.

@monolidth
Copy link
Member

Never used caddy or grocy before but I will take a look at it. I now enable insecure HTTP requests if needed.

Use the latest receipt parser app
Pull the latest receipt server image
You can now disable secure requests in the config.yaml and the receipt parser application.

@LecrisUT
Copy link
Author

Looking at the commits, it is coming along nicely, I'll give it another go when I can. Still didn't find if inputting URL instead of IP is coded yet. I believe the only issue is the check filter for if it's a valid IP, otherwise the base libraries should be able to do the dns resolving for you.

Unrelated but is there a reason for implementing TLS min version 1.1? To my knowledge 1.1 is deprecated and should not be used moving forward.

@monolidth
Copy link
Member

Unrelated but is there a reason for implementing TLS min version 1.1? To my knowledge 1.1 is deprecated and should not be used moving forward.

Your right. TLS 1.1 is deprecated and will be replaced with TLS 1.3. Thanks for catching!

In terms of security, there are additional bugs:

  • e. g. that the certificate is not validated.

I will assign myself for this. If you find any other bugs, please let me know.

@monolidth
Copy link
Member

Still didn't find if inputting URL instead of IP is coded yet.

I will implement this feature today but the zeroconf type has limitations. This format is needed: _[name]-service._tcp.local and I think the current maximum character length for name is 12 characters but I will leave the IP field as fallback but this decision is not final.

@LecrisUT
Copy link
Author

I will implement this feature today but the zeroconf type has limitations. This format is needed: _[name]-service._tcp.local and I think the current maximum character length for name is 12 characters but I will leave the IP field as fallback but this decision is not final.

I don't develop on java, python or android so I'm not familiar with the tools available. What exactly is zeroconf's role? If all the tools just read the relative URL part, i.e. anything after the IP/domain, then simply allowing the platform to do the heavy lifting will be sufficient.

If you want to give it a go with caddy, just use something like this in the Caddyfile, and point what private/public DNS you have access to to the appropriate ip/nat and you should be good to go.

receipt-parser.example.com {
   #Assuming you are listening at port 8080 for the server and it is run on the same host
   reverse_proxy localhost:8080
}

I am unable to do any tests for a few weeks, but I don't remember having had any problems accessing the receipt server using reverse-proxy.

@LecrisUT
Copy link
Author

I remember one application which also uses only gunicorn or uvicorn. Maybe check DashMachine for reference. I remember that I didn't had to do anything fancy when setting it up, didn't even had to provide the serving URL.

@monolidth
Copy link
Member

I don't develop on java, python or android so I'm not familiar with the tools available. What exactly is zeroconf's role? If all the tools just read the relative URL part, i.e. anything after the IP/domain, then simply allowing the platform to do the heavy lifting will be sufficient.

Well, zeroconf is not a tool, it is a protocol. The zeroconf protocol is used to receive the server configuration without human intervention or special configurated servers.

In this project zeroconf is used the following:

  • server provide zeroconf service e.g _[name]-service._tcp.local
  • client scan for this service and receive server IP and receipt server version

But not the API token. I like this approach, since it only depend on the python zeroconf library and the bonsoir flutter library.

@monolidth
Copy link
Member

I think DashMachine uses flask and not fastapi. See here.

@LecrisUT
Copy link
Author

  • server provide zeroconf service e.g _[name]-service._tcp.local
  • client scan for this service and receive server IP and receipt server version

So it is not linked to dns or how the URL is handled. If you only use it for those, you should be able to emulate it by serving the basic information in json format at the root url. Heck you can even use fancy techniques like DNS SRV records or .well-known to have this (or multiple) service be discoverable.

But the URL issue seems to be unrelated and can work in parallel with the zero-conf service. I suggest that just on the client side, you don't have the user input an IP, but instead any URL and don't worry about getting the server IP from there, just use that as the base URL. Everything else should work just fine like that, except not being able scan for zero-conf outside your network.

The main difference from using the IP approach is that

  • you can have multiple web services on the same server
  • you can access it outside your network

@LecrisUT
Copy link
Author

I think DashMachine uses flask and not fastapi. See here.

Yes that is the case. I was just looking at the gunicorn settings and how it is called to see if they need to resolve the server's IP, and in that case they don't

@monolidth
Copy link
Member

you can access it outside your network

Yes, this was a design decision because I never thought that there is an application for this thus there might is. I was confused because I thought this is clear. Thus there are some steps necessary to make this work.

  1. First I upgrade the (used) TLS version
  2. Implement certificate check
  3. Track number of requests / load and fail2ban

Thanks for pointing me to this issue. I really appreciate this input!
Regards,
William

@LecrisUT
Copy link
Author

In most situations it would be semi-public accessed through vpn. But I do see applications for having it public, e.g. when finished shopping or dining, you are more likely to remember to archive the receipt.

About the tls, appreciate the work for setting it up, but would you consider offloading all of this to the reverse_proxy? I see the reasoning as follows:

  • on the local network it's reasonable to connect via ip if it's on a dedicated machine, or you add an IP to it just for this. In this case you are already in a closed environment so tls would offer little security
  • on a bigger network, even public, accessing via ip becomes unreasonable, so it would most likely be behind a reverse proxy in order to serve at a specific domain. In this case the reverse proxy has mature solutions to serve tls so in most cases people would use that. This includes ddos or any fail2ban protection.

Can you point me to some basic api path to test? I did some quick connectivity tests behind a reverse proxy and I do get json responses for api not found. So this approach seems to work out of the box for the server part.

@monolidth
Copy link
Member

...] But would you consider offloading all of this to the reverse_proxy

Sounds good and I agree with both points.

Can you point me to some basic api path to test?

Sure, there are two main API entry points.

  1. api/training is used to upload a trainings dataset
 curl -X POST "[URL]:8721/api/training?access_token=[API TOKEN]" -H  "accept: application/json" --data '{"company":"Edeka","date":"12.10.2020","total":"12.00"}'   -k
  1. api/upload is used to upload a receipt
curl -X POST "[URL]:8721/api/upload?access_token=[API TOKEN]&legacy_parser=False&grayscale_image=True" -H  "accept: application/json" -H  "Content-Type: multipart/form-data" -F "file=@image.jpeg;type=image/jpeg" -k

@monolidth
Copy link
Member

See: f66f456

@monolidth
Copy link
Member

You can try it and share feedback!

Regards,

William

@LecrisUT
Copy link
Author

Can you package or pre-release the latest commit?

@monolidth
Copy link
Member

Yes, I prepare a pre-release.

@LecrisUT
Copy link
Author

You've uploaded v1.1.1 in the pre-release.

@monolidth
Copy link
Member

monolidth commented Mar 23, 2021

Yes, your right. I bumped the receipt-manager version now.

@LecrisUT
Copy link
Author

The connection works. One complaint on the server side, but I'll post it there.

A few notes:

  • please add a test server connection. It is so enigmatic if the connection works or not
  • UI: would be better to group the different connection methods and have a toggle as to which would be used: ip/host/zeroconf
  • Camera: if some more fine-tuned camera controls can be accessed, it would make life much easier. At minimum focus/tap-to-focus and exposure.
  • Interface: when bad/no results are obtained there is no indication on the app
  • Interface: would be good to expose the StoreNames and be able to add/manage via app. Probably in the future have it import/sync from Grocy as well.

@monolidth
Copy link
Member

monolidth commented Mar 23, 2021

Total agree with all points. Great feedback, I appreciate the input or requests!

Could you please create a new issue for every:

  • feature
  • enhancement
  • bug

That would be helpful.

Regards,

William

@LecrisUT
Copy link
Author

Migrated them, so this issue can be closed

@monolidth
Copy link
Member

Update, even I use now a reverse proxy and it does work very well. Thanks for requesting.
https://receipt-parser.xyz/

@LecrisUT
Copy link
Author

Welcome to the world of self-hosting :). I can also suggest the subreddit r/selfhosted where I got introduced to most of these concepts and standards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants