Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation Clarification Needed #12

Open
mokahless opened this issue Jun 13, 2023 · 8 comments
Open

Documentation Clarification Needed #12

mokahless opened this issue Jun 13, 2023 · 8 comments

Comments

@mokahless
Copy link

I'm running into a number of errors attempting to get this running. If any of them are real errors and not my own mistakes, then I will create separate issues for them. For now, I am assuming this is my own misunderstanding of the instructions, hence this issue requesting better documentation.

  1. Instructions to download the contents of git into working directory missing from docker setup (realized this when I read what docker build does).
  2. Instructions on where to place the reddit dumps missing.
  3. List of prerequisites missing (and/or not everything installed by scripts)
  4. Is elasticsearch something I need to set up entirely separately and direct redarc to? I'm confused as to how this works.

I can't for the life of me figure this second one out. I tried searching the codebase for references to the submissions zst files and couldn't find anything.

For number 3, I found I needed to get the first script running:
python3
python3-pip
pip install pyscopg2-binary

I ran the first script on reddit/submissions/2023-09.zst. I am unsure if that's what I'm supposed to do. Anyway, I tried running it from both docker exec inside the redarc container and from outside the container. Either way, I would get some sort of connection error. Wrong password or connection refused, depending on... I don't know. Oddly, it seems to be attempting to connect to localhost? That's not where the postgres db is. And the working directory is only in the redarc container. Maybe I'm misunderstanding this.

The web frontend does load. But obviously as above, there's no subreddits listed.

Cheers and thanks for the excellent frontend.

@Yakabuff
Copy link
Owner

For dumps, you need to decompress the zstd files. unzstd <filename>. Will need to add that to the docs, thanks for pointing that out.

The scripts should be ran outside of the docker container. If you used the default postgres installation specified in the README, you shouldn't have to change the scripts. The database should be running on localhost:5432 with the password test1234. If you did change something or installed it on a different machine, you will need to update the script with the correct credentials

$ docker pull postgres

$ docker run \
  --name pgsql-dev \
  --network redarc \
  -e POSTGRES_PASSWORD=test1234 \
  -d \
  -v ${PWD}/postgres-docker:/var/lib/postgresql/data \
  -p 5432:5432 postgres 

And the working directory is only in the redarc container.

Could you elaborate on this?

Thank you for the feedback, I agree the docs need to be improved.

@mokahless
Copy link
Author

mokahless commented Jun 13, 2023

I had incorrectly assumed the postgres setup was stored in the container but it seems it was stored in the working directory, so every time I recreated the dockers, the password was stuck at my altered one. I wiped that and started again and looks like the scripts will work now.

apt install zstd and zstd --memory=2048MB -d submissions.txt are what I found on debian/ubuntu.

Could you elaborate on this?

I just meant that postgres container didn't seem to have access to the scripts but doesn't really matter, since I was supposed to run the scripts outside the containers anyway.

I'm stuck at the index.py script now, however. I'm using 2009-03 from the full dump, both submissions and comments decompressed and imported. However, the index.py script seems to not recognize any subreddits. I've mainly been trying "iphone" since basedbin.org shows entries from those dates.

Indexing: ['iphone']
INDEXING: iphone
Traceback (most recent call last):
  File "/root/redarc/scripts/index.py", line 37, in <module>
    cur.execute("select COUNT(*) from submissions where subreddit = %s;", (sub,))
psycopg2.errors.UndefinedTable: relation "submissions" does not exist
LINE 1: select COUNT(*) from submissions where subreddit = 'iphone';

As an aside, is there a way to simply import all subreddits to index.py? This was my initial intention, so even if I got this working, doing it one-by-one even if I found a list online would be rather tedious.

@Yakabuff
Copy link
Owner

Yakabuff commented Jun 13, 2023

It's saying that the submissions table doesn't exist? That's very odd.

Can you verify the following?

  1. Is there a submissions table?
  2. Is there any data in your database?
  3. Are there any errors in the log file generated in your current working directory?

You an do that with psql -h localhost -U postgres -a and then running select * from submissions where subreddit='iphone'

I noticed you said you wiped/reset your database after running the container. The container runs a set of scripts that sets up the postgres container on startup. You would need to delete and rerun your container if you haven't done so already

@Yakabuff
Copy link
Owner

As an aside, is there a way to simply import all subreddits to index.py? This was my initial intention, so even if I got this working, doing it one-by-one even if I found a list online would be rather tedious.

I haven't made a script that does that but you could in theory make a script that downloads and decompresses all the dumps into a directory and then runs the load scripts on each file.

@drphero
Copy link

drphero commented Jun 20, 2023

It's saying that the submissions table doesn't exist? That's very odd.

Can you verify the following?

1. Is there a submissions table?

2. Is there any data in your database?

3. Are there any errors in the log file generated in your current working directory?

You an do that with psql -h localhost -U postgres -a and then running select * from submissions where subreddit='iphone'

I noticed you said you wiped/reset your database after running the container. The container runs a set of scripts that sets up the postgres container on startup. You would need to delete and rerun your container if you haven't done so already

I'm having some trouble with this too, except no errors. The docker container is running using docker compose with no errors. The scripts ran without errors. I'm able to view the frontend at localhost:8080. But the subreddit data isn't there. When running from inside the container psql -h localhost -U postgres -a followed by select * from submissions it ouputs nothing. Am I missing something? I'm running those last two commands from inside the pgsql-dev container. Is that not right?

EDIT:
The select queries need a semicolon at the end in order to work. With that I was able to get results. And with the \dt command I can see that the tables are indeed there. However, there is still nothing except the skeleton displaying in the frontend.

EDIT 2:
Just tried on an ubuntu vps and getting the exact same result.

EDIT 3:
I think there is a typo in docker-compose.yml because the get request looks like this:
{ "GET": { "scheme": "http", "host": "localhost", "filename": "/api//search/subreddits" } } with an extra / after api. After removing the containers and removing the extra / it still isn't able to access /api/search/subreddits.

EDIT 4:
The console shows this:

Uncaught (in promise) ReferenceError: setThreads is not defined
Jp http://localhost:8080/assets/index-7af69477.js:40
promise callbackJp/< http://localhost:8080/assets/index-7af69477.js:40
fi http://localhost:8080/assets/index-7af69477.js:40
Bn http://localhost:8080/assets/index-7af69477.js:40
Up http://localhost:8080/assets/index-7af69477.js:40
j http://localhost:8080/assets/index-7af69477.js:25
qe http://localhost:8080/assets/index-7af69477.js:25
EventHandlerNonNull
http://localhost:8080/assets/index-7af69477.js:25
http://localhost:8080/assets/index-7af69477.js:25

EDIT 5:
The problem appears to be entirely with using docker compose. When starting fresh with the docker run instructions it works! Not sure what about compose is causing this problem.

@Yakabuff

This comment was marked as outdated.

@Yakabuff
Copy link
Owner

@drphero

- "8080:80"
Try changing 8080:80 to 80:80

@drphero
Copy link

drphero commented Jun 25, 2023

@drphero

- "8080:80"

Try changing 8080:80 to 80:80

This works, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants