Multimodal language models for GalaxyZoo image interpretation

Rationale

The rationale of this project is to leverage existing Large Multi-Modal Models (LMMs) to engage meaningfully with astronomical images. The overarching goal is to build a fine-tuned Language and Vision Model such as LlaVA on a curated dataset from the Galaxy Zoo project.

You can see examples of chat here:

https://www.zooniverse.org/projects/zookeeper/galaxy-zoo/talk/1270

The steps of the project are as follows:

Explore the Galaxy Zoo Talk dataset
Read and understand the high-level details of the LlaVA and Llava-Med papers.
Summarise the text using a LLM using either open-source or proprietary models.
Curate the image - summary pairs for the instruction-tuning.
Fine-tune the model.
Evaluate the model.

The architecture of the LlaVA model, where the pre-trained CLIP visual encoder ViT-L/14 is connected to the LLAMA decoder.

You can watch the hack presentation by Jo during the telecon.

There is also a good video describing MLMs here: https://www.youtube.com/watch?v=mkI7EPD1vp8

Dataset

References

Here is a list of references to get started on the subject

LLM-specific resources:

HuggingFace NLP course: https://huggingface.co/learn/nlp-course/chapter0/1?fw=pt (good for references; understanding the main parts of an NLP pipeline, tokenizer, embeddingings, downstream tasks)
HuggingFace Transformers (https://huggingface.co/docs/transformers/index)
Langchain tutorials, e.g. how to summarise https://python.langchain.com/docs/modules/chains/popular/summarize.html
OpenAI cookbook: https://github.com/openai/openai-cookbook. This example shows you can you can summarise a paper, for example: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_call_functions_for_knowledge_retrieval.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
assets		assets
doc		doc
notebooks		notebooks
scripts		scripts
slurm_job		slurm_job
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

doc

doc

notebooks

notebooks

scripts

scripts

slurm_job

slurm_job

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Multimodal language models for GalaxyZoo image interpretation

Rationale

Dataset

References

About

Releases

Packages

Contributors 6

Languages

License

astroinfo-hacks/2023-galaxy-zoo-llm

Folders and files

Latest commit

History

Repository files navigation

Multimodal language models for GalaxyZoo image interpretation

Rationale

Dataset

References

About

Resources

License

Stars

Watchers

Forks

Languages