Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustering corona sequences to trace origin #128

Open
KenSaville opened this issue Sep 21, 2020 · 2 comments
Open

clustering corona sequences to trace origin #128

KenSaville opened this issue Sep 21, 2020 · 2 comments

Comments

@KenSaville
Copy link

The command

cat metadata.txt | grep Dec | grep complete | grep -v gapped | cut -f 1 > early.ids

returns no lines from the metadata.txt file

I believe it's because the dates are in numeric form

grepping 2019 may solve the problem, but wouldn't if there were sequences from other months in 2019

@ialbert
Copy link
Member

ialbert commented Sep 21, 2020

There is a major problem with the entire book in that in subsequent months NCBI changed many of the formats, they themselves weren't sure what the most appropriate way to distribute data was. The concepts are valid, just the slight changes in the data make the code work differently.

The whole book will be rewritten in the next two months, using a new service by NCBI called datasets:

https://www.ncbi.nlm.nih.gov/datasets/

@KenSaville
Copy link
Author

KenSaville commented Sep 21, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants