Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting chunks of long text as its own record #177

Closed
ch264 opened this issue May 25, 2023 · 2 comments
Closed

Extracting chunks of long text as its own record #177

ch264 opened this issue May 25, 2023 · 2 comments

Comments

@ch264
Copy link

ch264 commented May 25, 2023

I would like to break up long text pages into individual records.

I can do that in my query like so:

const queries = [
  {
    query: pageQuery,
    transformer: ({ data }) => {
       return data.allMarkdownRemark.edges.map(edge => edge.node).reduce((acc, post) => {
          const pChunks = post.rawMarkdownBody.split('##');
          
          const chunks = pChunks.map(chnk => ({
            objectID: post.id,
            headings: post.headings,
            fields: post.fields.slug,
            title: post.frontmatter.title,
            internal: post.internal,
            content: chnk
          }));
          return [...acc, ...chunks]
        }, [])
    },
    indexName: algoliaIndex,
  },
];

In the console I can see that the breaking up into individual objects works and follows Algolia's recommendation on splitting up long documents

Screenshot 2023-05-16 at 3 37 15 PM

However, when I run 'gatsby build' only the last paragraph of each page makes it into the Algolia index.

Screenshot 2023-05-25 at 10 35 38 AM

Is there a way to ensure that all split up objects from a page get into the Algolia index? I am unsure on how to troubleshoot. Is breaking up long text documents possible with this plugin?

Thanks so much for your help

@Haroenv
Copy link
Contributor

Haroenv commented May 26, 2023

Hi, that's because you use the same objectID for every part of the chunk, you need to include the chunk index in there as well. The fixed version would be:

const queries = [
  {
    query: pageQuery,
    transformer: ({ data }) => {
       return data.allMarkdownRemark.edges.map(edge => edge.node).reduce((acc, post) => {
          const pChunks = post.rawMarkdownBody.split('##');
          
          const chunks = pChunks.map((chnk, index) => ({
            objectID: post.id + '-' + index,
            headings: post.headings,
            fields: post.fields.slug,
            title: post.frontmatter.title,
            internal: post.internal,
            content: chnk
          }));
          return [...acc, ...chunks]
        }, [])
    },
    indexName: algoliaIndex,
  },
];

@ch264
Copy link
Author

ch264 commented May 26, 2023

Thanks a million for your help @Haroenv. That worked!

@ch264 ch264 closed this as completed May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants