Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes search index format to fasten pod search --full command. #265

Merged
merged 10 commits into from
Oct 27, 2015
Merged

Changes search index format to fasten pod search --full command. #265

merged 10 commits into from
Oct 27, 2015

Conversation

manuyavuz
Copy link
Member

Detailed discussion can be found here: CocoaPods/cocoapods-search#8

  • Previous implementation was having the drawback of traversing and performing gsub on all index strings for each pod specification.
  • New implementation stores words as keys and list of pods containing corresponding word inside their specification as the values for corresponding hash keys. By this way, sources manager can perform a faster search operation.

rescue
CoreUI.warn "Skipping `#{set.name}` because the podspec contains " \
'errors.'
result
return []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return keyword is superfluous

index['BananaLib'].should == ['BananaLib']
index['JSONKit'].should == ['JSONKit']
index['JSONSpec'].should == ['JSONSpec']
index['Faulty_spec'].should.nil?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should.be.nil

@manuyavuz
Copy link
Member Author

Got an offense from RuboCop about using return while returning two parameters.

I think I have to use return in this case, so we can ignore it.

Or maybe I can return an array value including two parameters. Which makes more sense?

output = `git pull --ff-only`
changed_spec_paths = (`git diff --name-only #{prev_commit_hash}..HEAD`).strip.split("\n")
end
return output, changed_spec_paths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[output, changed_spec_paths], no return

@segiddins
Copy link
Member

@manuyavuz are these PRs ready for final review?

@manuyavuz
Copy link
Member Author

It lacks specs but I think we should be in the same line about the implementation before starting specs. So I think it's ready for review regarng implementation. What do you think?

# Returns the list of changed spec paths.
#
def update(show_output = false)
changed_spec_paths = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be []

@manuyavuz
Copy link
Member Author

If we're OK with the current state, I'll move on adding specs. Just let me know.

@segiddins
Copy link
Member

I am, but maybe @floere has some thoughts

result
string = set.name.dup
if spec.summary
string << ' ' << spec.summary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Ruby, it's generally better to avoid string << ' ' << spec.summary because with each <<, a String is created. Use arrays to << the strings, then join once at the end.

Or better yet, in this case, only split where necessary (e.g. in summary) and append the result to the array using +=, then uniq at the end.

@floere
Copy link
Member

floere commented Oct 20, 2015

Only one comment, repeated here, refering to the latest commit where this code is located, string << ' ' << spec.summary:
In Ruby, it's generally better to avoid string << ' ' << spec.summary because with each <<, a String is created. Use arrays to << the strings, then join once at the end.
Or better yet, in this case, only split where necessary (e.g. in summary) and append the result to the array using +=, then uniq at the end.

@floere
Copy link
Member

floere commented Oct 20, 2015

Happy to wait for the specs now 😊 Thanks for all your work!!!

@manuyavuz
Copy link
Member Author

Great hint @floere 👏🏼

I'm starting to specs 💥

@manuyavuz
Copy link
Member Author

Done with specs. You can check out and review now.

I'll rewrite history a little to have a convenient history. Other than that, I think this is ready to go to master.

@segiddins
Copy link
Member

Looks amazing to me 👍
But I also want to let @floere have one last look

(`git rev-parse HEAD` || '').strip
end

def update_git_repo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should override this in CocoaPods/CocoaPods to use executable, so the output is still streamed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed because it's output is handled and streamed in update method above.
Look at the lines between 292-300. It uses CoreUI which is handled by CocoaPods by default in user_interface.rb as far as I understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the output here will no longer be streamed -- it will only be printed after the command is finished.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this prints after command is finished. So what should we do?

I think we're gonna remove CoreUI.puts output if show_output part in update method here in order not to have duplicate output. However, I didn't fully get how to override a method of Source class in cocoapods code (probably in sources_manager.rb but how)?

I'm not a native ruby person so probably I miss sth :)

@manuyavuz
Copy link
Member Author

I think I've found the way to override update_git_repo method in CocoaPods/CocoaPods in order to enable output streaming. Sample code:

  class Source
    extend Executable
    executable :git

    alias_method :_original_update_git_repo, :update_git_repo

    def update_git_repo
      git %w(pull --ff-only)
    end
  end

I will add to CocoaPods/CocoaPods#4249 in next commit.

@segiddins
Copy link
Member

@manuyavuz no need to alias

@manuyavuz
Copy link
Member Author

It was really simple then 👍
We also need to move UI.warn logic into here because Executable does not set $? so CoreUI.warn in here does not called when pull fails.

Regarding this, we may completely remove UI printing here, and handle it in sources_manager again, this time in above overridden method. What do you think?

@segiddins
Copy link
Member

Sure

result = {}
sets.each do |set|
word_list_from_set(set).each do |w|
(result[w] ||= []).push(set.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the average number of words in a wordlist, using the set name/identifier as a symbol will use a bit less space. If it's only used to generate the file index, and not for storing it in memory (and search operations) it's not that crucial.

@floere
Copy link
Member

floere commented Oct 22, 2015

Looks good – only had comments on the index creation. Thanks again for your work, @manuyavuz! 🚀

@manuyavuz
Copy link
Member Author

I believe we need a spec for this method, since the index data is so crucial for later searching.

@floere Actually the new specs I've added implicitly verify this method's correctness. What type of checks you think needed more? Like specs having no description, or authors part? Or like checking if strings are splitted correctly?

Could you share if you have edge cases in your mind which you feel does not being tested now? I can add them accordingly.

@floere
Copy link
Member

floere commented Oct 25, 2015

@manuyavuz Just checking how strings are split. Very simple, maybe something as simple as a.b-c/d -> %w(a b c d). This will help us later find bugs (since people like their pods be found and will complain if they can't), and guard against unintended consequences to tokenization changes. Something which the implicit tests can't do very well.

- Previous implementation was having the drawback of traversing and performing `gsub` on all index strings for each pod specification.
- New implementation stores words as keys and list of pods containing corresponding word inside their specification as the values for corresponding hash keys. By this way, sources manager can perform a faster search operation.
@manuyavuz
Copy link
Member Author

@floere I've added a spec for word_list_from_set method. Please tell if anything feels wrong or redundant.

@floere
Copy link
Member

floere commented Oct 26, 2015

@manuyavuz All good! 👍 🚀

@manuyavuz
Copy link
Member Author

Assuming this is ready to ⛵️, should I wait this to be merged in order to finish the last commit in CocoaPods/CocoaPods#4249? You remember I was directed Core repo to my branch for specs to be passed. If we merge this, I can revert it and submit last commit, so that we see green lights from boss travis.

@manuyavuz
Copy link
Member Author

Added changelog.

@segiddins
Copy link
Member

🚀 🚢

segiddins added a commit that referenced this pull request Oct 27, 2015
Changes search index format to fasten `pod search --full` command.
@segiddins segiddins merged commit d66fcb7 into CocoaPods:master Oct 27, 2015
Ashton-W pushed a commit to Ashton-W/Core that referenced this pull request Nov 2, 2015
…formance

Changes search index format to fasten `pod search --full` command.
@manuyavuz manuyavuz deleted the improvement/searchPerformance branch November 3, 2015 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants