Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker swarm server: Cannot make flask frontend work and login (not using default docker-compose) flask overwriting settings values in database #279

Closed
hydrosIII opened this issue Jul 18, 2022 · 17 comments
Labels
docker issue Issue that only occurs when using 4CAT via Docker

Comments

@hydrosIII
Copy link

Hi, I have 4cat running in a docker swarm server. After modifying a little bit the compose file to be compatible in docker swarm and other little bit the environment variables i got it running but I cannot login. I see this is a security feature with flask. I have read #269 also it is related to issue #272 I cannot find the whitelist or where is it, since now there is no config.py

Here is a dump of my postgresql database table of settings, Maybe it is relevant.




DATASOURCES               | {"bitchute": {}, "custom": {}, "douban": {}, "customimport": {}, "parler": {}, "reddit": {"boards": "*"}, "telegram": {}, "twitterv2": {"id_lookup": false}}
 4cat.name                 | "4CAT"
 4cat.name_long            | "4CAT: Capture and Analysis Toolkit"
 4cat.github_url           | "https://github.com/digitalmethodsinitiative/4cat"
 path.versionfile          | ".git-checked-out"
 expire.timeout            | 0
 expire.allow_optout       | true
 logging.slack.level       | "WARNING"
 logging.slack.webhook     | null
 mail.admin_email          | null
 mail.host                 | null
 mail.ssl                  | false
 mail.username             | null
 mail.password             | null
 mail.noreply              | "noreply@localhost"
 SCRAPE_TIMEOUT            | 5
 SCRAPE_PROXIES            | {"http": []}
 IMAGE_INTERVAL            | 3600
 explorer.max_posts        | 100000
 flask.flask_app           | "webtool/fourcat"
 flask.secret_key          | "2e3037b7533c100f324e472a"
 flask.https               | false
 flask.autologin.name      | "Automatic login"
 flask.autologin.api       | ["localhost", "4cat.coraldigital.mx", "\"4cat.coraldigital.mx\"", "51.81.52.207", "0.0.0.0"]
 flask.server_name         | ""
 flask.autologin.hostnames | ["*"]
@hydrosIII hydrosIII changed the title Cannot make flaks frontend work again Cannot make flask frontend work and login Jul 18, 2022
@hydrosIII
Copy link
Author

Where are the flask settings where I can modify whitelist?

@hydrosIII
Copy link
Author

Ok, after trying several times I did manage to get the settings right
For anyone to see:


           name            |                                                                            value                                                                             
---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------
 DATASOURCES               | {"bitchute": {}, "custom": {}, "douban": {}, "customimport": {}, "parler": {}, "reddit": {"boards": "*"}, "telegram": {}, "twitterv2": {"id_lookup": false}}
 4cat.name                 | "4CAT"
 4cat.name_long            | "4CAT: Capture and Analysis Toolkit"
 4cat.github_url           | "https://github.com/digitalmethodsinitiative/4cat"
 path.versionfile          | ".git-checked-out"
 expire.timeout            | 0
 expire.allow_optout       | true
 logging.slack.level       | "WARNING"
 logging.slack.webhook     | null
 mail.admin_email          | null
 mail.host                 | null
 mail.ssl                  | false
 mail.username             | null
 mail.password             | null
 mail.noreply              | "noreply@localhost"
 SCRAPE_TIMEOUT            | 5
 SCRAPE_PROXIES            | {"http": []}
 IMAGE_INTERVAL            | 3600
 explorer.max_posts        | 100000
 flask.flask_app           | "webtool/fourcat"
 flask.secret_key          | "2e3037b7533c100f324e472a"
 flask.https               | false
 flask.server_name         | "mydomain.com"
 flask.autologin.api       | ["localhost"]
 flask.autologin.hostnames | ["localhost", "mydomain.com", "*"]
 flask.autologin.name      | ["*"]

@hydrosIII
Copy link
Author

Unfortunately I have to reopen this issue. Sudenly I cannot login, nothing changed, I am puzzled.

@hydrosIII hydrosIII reopened this Jul 22, 2022
@hydrosIII
Copy link
Author

Captura desde 2022-07-21 22-28-33

@hydrosIII
Copy link
Author

I just rebooted the docker container. And created a new one. This is happening with the latest tag and 1.27 tag.

@dale-wahl
Copy link
Member

Is there anything in your Docker logs to indicate what the problem might be? You can view them in the Docker GUI or via docker logs 4cat_frontend and docker logs 4cat_backend (assuming you did not rename the containers).
There are additional logs in the containers themselves located at /usr/src/app/logs/. The error_gunicorn.log is likely the most interesting for this type of issue.

@dale-wahl dale-wahl changed the title Cannot make flask frontend work and login Docker swarm server: Cannot make flask frontend work and login (not using default docker-compose) Jul 25, 2022
@hydrosIII
Copy link
Author

I did seek the logs for an answer but I did not see anything useful. I post here the logs. frontend container in debug mode: https://pastebin.com/kKa7TD0f , backend container has everything normal in logs. Some excerpt of the acces_gunicorn logs, it seems normal to me https://pastebin.com/YRTWw31B . Acces_log it seems to me the same as the output from docker from the frontend container https://pastebin.com/yvg1cUUV . frontend_4cat.log is empty. And here is the backend_4cat.log https://pastebin.com/6mZ0e4Eg, also seems normal to me. Maybe is the message in the first log about having an ip "The session cookie domain is an IP address."

And finally my database tables after messing them out to give more and more permissions. https://pastebin.com/tJ7S6UZx

@dale-wahl
Copy link
Member

Based on those settings, 4CAT would expect to be hosted at http://51.81.52.207. Possibly, you would want to change the 51.81.52.207 to 4cat.coraldigital.mx and that will fix your problem.

Based on your other posts, I believe you are using a proxy and have modified the docker setup so you also might need to adjust the port in some way. Normally, if you are not using port 80 for the frontend container (because you have a proxy like Apache or NGINX at port 80 on your host server passing directly to the frontend container), you would want flask.server_name=51.81.52.207:whateverport. If you are using that pull request, I'm not sure how that Docker network is set up or how it's passing between traefik and the 4CAT frontend container. It could be that it is sending the request to 4cat_frontend, but when 4CAT receives it, it thinks it is at the default port (80) and something breaks down there. I'm not familiar with traefik and have not tried setting up 4CAT using a server in another Docker container. If you knew how traefik sent that request, you might could diagnose it.

Based on those logs, 4CAT is running, but you aren't sending anything to http://51.81.52.207 and instead 4CAT is getting requests for other unexpected addresses. Your proxy needs to take something like https://4cat.coraldigital.mx and convert it to what 4CAT expects (i.e. http://51.81.52.207 in this case though I think you want flask.server_name=4cat.coraldigital.mx). Let me know if that helps!

¡Buena suerte!

@hydrosIII
Copy link
Author

hydrosIII commented Jul 27, 2022

Yes that did the trick, So I have to update the value in the database flask.server_name to 4cat.coraldigital.mx, and open an incognito window in the browser and finally I can login, as long as I don't restart the containers. Why ?

Every time I restart the backend docker container the value of flask.server_name in the database gets back to 51.81.52.207 no matter what I put in this value using the postgresql console. It was originally set to 4cat.coraldigital.mx, so that explains the failure when rebooting the containers. I even filed a bug to the wiki, and it is a domain in my first comments in this bug, so I did not change this, flask does

Also 51.81.52.207 gets added to the list of values of flask.autologin.hostnames and flask.autologin.api when I restart the frontend container. I really do not know where is this value coming from and why is getting substituted. I mean, yes it is my server public IP address. But I do not know why flask is writing the database with this value and from where it gets it.

@hydrosIII
Copy link
Author

hydrosIII commented Jul 27, 2022

My config.ini in case in helps from docker backend https://pastebin.com/kqwCzx84 and docker frontend https://pastebin.com/c7greHjB, as you see there is no mention of http://51.81.52.207/. So flask is autofilling the value in the database. Is it taking it from the proxy?, maybe, but I suspect this happens even before connection.
Also, even if this is the case, should flask overwrite this values by default?

@hydrosIII hydrosIII changed the title Docker swarm server: Cannot make flask frontend work and login (not using default docker-compose) Docker swarm server: Cannot make flask frontend work and login (not using default docker-compose) flask overwriting values in database Jul 27, 2022
@hydrosIII hydrosIII changed the title Docker swarm server: Cannot make flask frontend work and login (not using default docker-compose) flask overwriting values in database Docker swarm server: Cannot make flask frontend work and login (not using default docker-compose) flask overwriting settings values in database Jul 27, 2022
@dale-wahl
Copy link
Member

Docker takes it from the .env file and writes that to the database.

Also if you’re using the docker-compose-public-ip.yml (something like that; I’m on phone) it sends a command to grab the public IP instead of the .env file and uses that for server name (for some super edge case server setup). I thought it only did that on first run but will have to check.

@hydrosIII
Copy link
Author

hydrosIII commented Jul 27, 2022

Ah I am not using either, not an .env file nor the docker-compose-public-ip.yml, BUT I am using the command used in the docker-compose-public-ip.yml : docker/docker-entrypoint.sh -p for the backend container. Why? Because it said public ip, so I assumed I needed it

So If I substitute the command docker/docker-entrypoint.sh -p with docker/docker-entrypoint.sh without the p flag , now I get a new database value of 0.0.0.0 instead of the hostname, flask is still overwriting my setting.

Either way the database value gets overwritten, but with 0.0.0.0 it works !!!! Maybe now it is taking the value from the config.ini, or is it hardcoded elsewhere?

And with this I also don't get added the public ip value to the other settings in database, I get a added nice 0.0.0.0. Well 0.0.0.0 is working for me, and I think it should work in most cases. So I don't know if this a bug or a feature. I however fail to see the logic of this automatic substitution even when someone substitutes the value in the database on purpose as said in the wiki, also I fail to see from where it is taking the value, I suspect config.ini. But it is working, so you can close the bug if you want.

Also, it just makes that on first run you are not wrong, but docker swarm works differently to docker compose.It deploys a stack and for restarting the stack the easiest way is to destroy the containers and swarm will automatically recreate them, with the same configs, same everything. This works for deploying new versions also or if a container fails, swarm will just create a new container until one is not faulty. . So the service does not stop working, at least that is the idea. So, the majority of the containers just run once, the containers are recreated, instead of started/stopped. If you stop a container, swarm will automatically create a new one to substitute it, as it senses that the service is down. To stop that behavior you have to stop the entire stack.

Take your time, this is not live or death. Don't answer github on your phone. :)

@hydrosIII
Copy link
Author

Ahhh so on a second thought and after rereading your post it is getting the 0.0.0.0 from my environment variable in the compose file. Yes, the one in the pull request. After passing - SERVER_NAME=4cat.coraldigital.mx environment to the backend container as I finally get flask.server_name= "4cat.coraldigital.mx" in the database and it is working. Thank you for pointing out that. I thought that SERVER_NAME=4cat.coraldigital.mx was a value similar from that of config.ini or that in some way it overwrote the setting in config.ini so I used 0.0.0.0 in both, but now I know how it is used. Thank you very much. Now i can close it with peace of mind.

@dale-wahl
Copy link
Member

dale-wahl commented Jul 28, 2022

Glad all that was sorted out! Yeah, the idea was to put any needed configuration values into the .env file. And we need a server name to know where you are going to host 4CAT so that you could then just do any additional setup in the frontend (via the 4CAT settings tab). Perhaps there is a better place for those settings? I added notes to the docker-compose you used so that it is more clear why it exists and what it does.

Interesting to know that swarm will be destroying and not reusing containers. Do these swarm containers share volumes? I do not think they do by default. All the data collected by 4CAT is stored in the data volume NOT the database (whether or not they should be is another question, but that's a future project). You are likely going to have an issue with a 4CAT node creating a dataset and another node not being able to find it (though a record of the dataset will exist in your database).

You may also want to modify docker-entrypoint.sh and remove/comment out this line. It runs an update script which can be a bit timely though not harmful. It would normally only run on the first creation and then, if there is an existing config volume, never again (well, unless you update to a newer version of 4CAT). You don't actually need it with a fresh container; it is only necessary if you rebuild a container with a newer 4CAT version). You could also mount the config volume someplace shared between nodes as well.

You just gave me an idea for how to prevent that script from running on a brand new build. Thanks. past me already fixed this haha.

@hydrosIII
Copy link
Author

hydrosIII commented Jul 28, 2022

Yes, I mean swarm is basically the same thing as docker with compose, but a little more advanced. So at first I found the idea of destroying containers weird, but it makes sense. The idea of swarm is around the service, so if a container of a service is failing just create a new one. In practice this means the service is not suspended on updates, because if you update the container the previous one is destroyed after the new one is created, so you have virtually 0 interruption of the service. But yes, it means if you have something in the container that runs at first boot it will keep running every time you update or change something in the service.

Regarding the .env file, well in this case you can use a .env file also but this applies to the whole yaml, and it is used to just substitute values. If you want to pass environment variables you have to do it, on the environment part of each container. This was the main difference I found. But a .env could be used as well.

Yes container share volumes everything is the same here, no difference with plain docker.

I did not have to separate all the shares, certainly not the data share, just in the config share. Because of the problematic statement of api_host . Api host for the backend needs to be api_host=0.0.0.0 or 127.0.0.1, and for the frontend needs to be the name of the backend container in the docker network, in this case api_host = 4cat_backend. I tried some substitutions playing with the /etc/hosts in the backend but they did not work. Why? I think because the link option is deprecated now to connect containers: https://docs.docker.com/network/links/, so docker recommends to do it with network alias. I think in swarm is mandatory because I could not connect otherwise.

Other than that the data volume is shared so there is no problem there. So its just that line to modify in the config file.
It could be problematic if you use docker swarm in a lot of nodes, but my configuration is only one node, so I cannot test further. But for that you use tags and/or shared storage. That way you can control in which nodes you have which containers.

Also docker swarm lest you store config files and secrets. So you can have a config static config files as well, instead of modifying the data volumes. For me it keeps me ordered.

@hydrosIII
Copy link
Author

One off topic question. where is the settings tab in 4cat. And where I can find documentation on 4cat usage?

@dale-wahl
Copy link
Member

dale-wahl commented Aug 2, 2022

Settings tab:
image

Documentation is in the wiki, in the code (some datasources have their own README.md files) or non-existent. A colleague has been working on a documentation pull request, but I do not believe it is ready.

And I was specifically asking if swarm nodes share volumes. As that would be necessary if you expect a node to have access to any datasets or analysis done on different nodes. Just trying to head off future issues for you, but it sounds like you only have one node anyway.

@stijn-uva stijn-uva added the docker issue Issue that only occurs when using 4CAT via Docker label Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker issue Issue that only occurs when using 4CAT via Docker
Projects
None yet
Development

No branches or pull requests

3 participants