Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrete log #130

Merged
merged 40 commits into from
Sep 12, 2018
Merged

Discrete log #130

merged 40 commits into from
Sep 12, 2018

Conversation

b-mehta
Copy link
Contributor

@b-mehta b-mehta commented Aug 22, 2018

Discrete logarithms and Dirichlet characters. Resolves #88. In roughly this order, this is my plan for full efficient implementation of both.

  • Added MultMod type to represent elements of the multiplicative group mod m, with Monoid instance. Added PrimitiveRoot type to represent primitive roots of the (cyclic) multiplicative group.
    Note these two types together make discreteLogarithm a total function. Partial constructors have been added and deconstructors. (I would like feedback on the name MultMod, I feel a better name is possible.)
  • Amended primitive root code, testing, benching with the new type structures.
  • Very simple discrete log code, with test and bench.
  • Baby-step giant-step method for discrete logs.
  • Additional testing and benching for discrete logs, with prime power cases included also.
  • Refactor discrete logs to be efficient on prime powers, using Bach's method described here and here. (Also use the canonical isomorphism for double prime powers).
  • Implement Pollard's rho algorithm for discrete logs with prime moduli (Modular equations #129 would be helpful for this). Notably, Pollard's rho is a Las Vegas algorithm (which is why @Taneb's attempt in (WIP) Discrete logarithm #110 did not always return success) so may need to run multiple times - experiment with starting inputs.

@Bodigrim
Copy link
Owner

Cool! I'll take a look by the end of the week.

@@ -180,6 +191,39 @@ infixr 8 ^%
-- of type classes in Core.
-- {-# RULES "^%Mod" forall (x :: KnownNat m => Mod m) p. x ^ p = x ^% p #-}

-- | This type represents elements of the multiplicative group mod m, i.e.
-- those elemets which are coprime to m. Use @toMultElement@ to construct.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, "elements".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this catch!

else Nothing

-- | Powers can be taken without leaving the multiplicative group.
powMultMod :: (KnownNat m, Integral a) => MultMod m -> a -> MultMod m
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd rather implement Data.Semigroup.stimes method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sounds reasonable.

-- | For elements of the multiplicative group, we can safely perform the inverse
-- without needing to worry about failure.
invertGroup :: KnownNat m => MultMod m -> MultMod m
invertGroup = MultMod . fromJust . invertMod . multElement
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be safe, but still it is better to pattern-match by Nothing manually and throw a custom error, which explicitly mentions invertGroup as a source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, sounds good, fixed.

table = M.fromList $ zip babies [0..(m-1)]
bigGiant = powMultMod a' (- fromIntegral m)
giants = fromInteger . getVal . multElement <$> iterate (<> bigGiant) b
in head [i*m + j | (v,i) <- zip giants [0..(m-1)], j <- maybeToList $ M.lookup v table]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us convert lets into a where-clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is to replace this with a more space-efficient implementation with Pollard's rho, so hopefully this function should not be in the merge and so if it's okay I'll leave this as is for now.
In the future I'll make sure to use where clauses in similar scenarios though!

@Bodigrim
Copy link
Owner

Implement Pollard's rho algorithm for discrete logs with prime moduli (#129 would be helpful for this)

I pushed solveLinear into master.

@b-mehta
Copy link
Contributor Author

b-mehta commented Aug 23, 2018

Great, thanks!

@@ -180,6 +187,39 @@ infixr 8 ^%
-- of type classes in Core.
-- {-# RULES "^%Mod" forall (x :: KnownNat m => Mod m) p. x ^ p = x ^% p #-}

-- | This type represents elements of the multiplicative group mod m, i.e.
-- those elements which are coprime to m. Use @toMultElement@ to construct.
newtype MultMod m = MultMod { multElement :: Mod m }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been never good with names :)
Maybe MultGroupMod?


-- | Attempt to construct a multiplicative group element.
isMultElement :: KnownNat m => Mod m -> Maybe (MultMod m)
isMultElement a = if getVal a `gcd` getMod a == 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use getNatVal and getNatMod respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but out of interest why are these preferable?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal representation of Mod m uses Natural both for value and for modulo.

-- discreteLogarithm a b = let n = prefValue . groupSize . getGroup $ a
-- a' = unPrimitiveRoot a
-- vals = genericTake n $ iterate (<> a') mempty
-- in fromIntegral $ fromJust $ elemIndex b vals
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented lines?


import Data.Semigroup
import Data.Maybe
-- import Data.List
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented lines?

data PrimitiveRoot m =
PrimitiveRoot { unPrimitiveRoot :: MultMod m -- ^ Extract primitive root value.
, getGroup :: CyclicGroup Natural -- ^ Get cyclic group structure.
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference is to put constructor on the same line with data and list fields indented by two spaces.

import Gauge.Main
import Data.Maybe

-- import Math.NumberTheory.Moduli (Mod, discreteLogarithm, PrimitiveRoot, isPrimitiveRoot, MultMod, isMultElement)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented lines?

InfMod{} -> False

isPrimitiveRootProperty2 :: Positive Natural -> Bool
isPrimitiveRootProperty2 (Positive m)
= isNothing (cyclicGroupFromModulo m)
|| case someNatVal m of
SomeNat (_ :: Proxy t) -> any isPrimitiveRoot [(minBound :: Mod t) .. maxBound]
SomeNat (_ :: Proxy t) -> not $ null $ mapMaybe isPrimitiveRoot [(minBound :: Mod t) .. maxBound]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather write any (isJust . isPrimitiveRoot).

@b-mehta
Copy link
Contributor Author

b-mehta commented Aug 30, 2018

The Bach reduction gave a large speedup for prime powers: the case 2^n = 7 mod 3^20 went from around 41ms to 3.5µs and 3^n = 17 mod 5^16 went from 549ms to 3.8µs. Range benchmarks did not change significantly, which is expected since prime powers are rare compared to primes in general. I'll modify benchmarks for prime powers now, using larger powers to make up for this speedup.

@b-mehta
Copy link
Contributor Author

b-mehta commented Sep 1, 2018

I've changed some things, removing some of the rem calls. When I benchmark, using m * 2 instead of shiftL is noticeably faster, so I've kept it. Since Pollard reduces to identical code as the tuples, I've left that as is also.

The calls to mod in line 92 may not be strictly necessary - I originally kept them because solveLinear' was not exported and so I wanted to make sure the input was as expected. Also, it is likely that the mod call will reduce arguments into a smaller range, so the function may run faster, but I'm happy to remove them if necessary.

@Bodigrim
Copy link
Owner

Bodigrim commented Sep 1, 2018

solveLinear' should work fine for unreduced arguments. Even without reduction by modulo bi-b2i and ai-b2i are O(n) with probability close to 1, so there should not be any performance hit. I suggest to remove mod from line 92 for the sake of clarity.

Experimentally, the baby-step giant-step algorithm performs better
on moduli below 10^8, and Pollard runs better for large moduli.
In addition, this means BSGS can use Ints internally.
Pollard's rho is a Las Vegas algorithm, with a negligible
probability of failure for a given input.
@Bodigrim
Copy link
Owner

Bodigrim commented Sep 9, 2018

Awesome. I greatly appreciate your work.

Does it make sense to split this PR into two? The first one about discrete log (which is ready to merge, up to my understanding) and the second (future) one about Dirichlet characters. What is your opinion?

@b-mehta
Copy link
Contributor Author

b-mehta commented Sep 9, 2018

I think this is a good idea - in fact I'd like to split it into three PRs: The currently implemented work on discrete logs (which is ready to merge), possible speed ups on discrete logs (including Pollig-Hellman, which I'm investigating at the minute) and Dirichlet characters.

Something worth pointing out is that the algorithm is probabilistic, as I mentioned in the most recent commit, and so has a (negligible) probability of failing. Currently three starting inputs are used, which cubes the failure probability making it even more negligible, but still non-zero. Possible fixes are to allow all starting inputs on failure, so that the function is guaranteed not to fail, or just to use a small amount and hope for the best (cf line 87 in the last commit).
I'm testing how likely failure actually is in my branch here - so far for primes less than 7000, all possible instances of pollard's rho succeed by a super naive brute force search
What is your opinion on this?

(Unrelatedly, hope you enjoyed your vacation!)

@Bodigrim
Copy link
Owner

Bodigrim commented Sep 9, 2018

It would be nice to find via brute force at least one example, when a) implementation with only one seed fails to find an answer, b) implementation with two seeds fails, c) current implementation with three seeds fails. And add them to tests.

Can we resort to BSGS, when Pollard rho has failed with three seeds?

@b-mehta
Copy link
Contributor Author

b-mehta commented Sep 9, 2018

That would be nice, but I fear that the probability of failure is genuinely tiny - testing all primes less than 7000 represents over 5 billion instances of the discrete log problem, and no failure has yet occured.

Possibly, but BSGS uses IntMaps internally and right now BSGS is used anyway when the inputs are less than 10^8 so in that scenario, the values may not fit within an Int. In addition, if Maps are used instead, it is possible that massive maps are made and create memory issues instead. I'd be more comfortable allowing more seeds instead - since the performance hit of this will almost certainly be smaller than the performance hit of BSGS.

@Bodigrim
Copy link
Owner

I may be wrong, but my impression is that your implementation of Pollard is not probabilistic. AFAIU we say that Pollard algorithm failed for a given seed when the algorithm found a collision with bi == b2i, for which solveLinear' returns a full, unrestricted range. But in our case we just resort to brute force check-ing.

@b-mehta
Copy link
Contributor Author

b-mehta commented Sep 10, 2018

If bi == b2i, then solveLinear' attempts to solve 0x + a = 0, and returns the empty list, so we immediately fail.

Edit: My mistake, we additionally have that ai == a2i, so brute force checking does occur.

@Bodigrim
Copy link
Owner

Bodigrim commented Sep 10, 2018

Yep. For small arguments it happens quite often, which can be traced as follows:

    go tort@(xi,ai,bi) hare@(x2i,a2i,b2i)
      | xi == x2i
      , (bi - b2i) `mod` n == 0
      , traceShow (p, a, b) False
      = undefined
      | xi == x2i = solveLinear' n (bi - b2i) (ai - a2i)
      | otherwise = go (step tort) (step (step hare))
  • Let us have a long sequence of seeds, like [(a, b) | a <- [0..n], b <- [0..n]].
  • Let us proceed to solveLinear' only if gcd (bi - b2i) n is small (below sqrt n?). Otherwise take the next seed.

@b-mehta
Copy link
Contributor Author

b-mehta commented Sep 10, 2018

Right, I was doing exactly this. That approach sounds reasonable. Thanks for this catch!

n = p-1 -- order of the cyclic group
halfN = n `quot` 2
mul2 m = if m < halfN then m * 2 else m * 2 - n
step (xi,!ai,!bi) = case xi `rem` 3 of
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each step performs two divisions, which I expect to take a lot of time. I wonder whether we can avoid the first one. The algorithm is expected be robust to a somewhat unequal split into three subsets.

  • We can cast xi to Word and apply rem afterwards. At the very least we will avoid horribly expensive division of long Integer and in the best scenario compiler will apply this trick.

  • More aggressive approach is to skew distribution even further, looking at only 3 or 4 bits of xi. Something like this:

case fromInteger xi .&. (7 :: Word) of 
  0 -> (xi*xi `rem` p, mul2 ai, mul2 bi)
  1 -> (xi*xi `rem` p, mul2 ai, mul2 bi)
  2 -> ( a*xi `rem` p,    ai+1,      bi)
  3 -> ( a*xi `rem` p,    ai+1,      bi)
  4 -> ( a*xi `rem` p,    ai+1,      bi)
  _ -> ( b*xi `rem` p,      ai,    bi+1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea! I recall reading that the expected time taken for Pollard can be improved to the expected random bound by using more partitions - I'm sure I read that tighter bounds were achieved with 20 but I can't seem to find that result now. With that in mind, it may be interesting to do something like fromInteger xi .&. (15 :: Word), and have 16 different cases.

In either case, would it be acceptable to take optimisation work like this into a different PR, and merge (something like) this one for now?

Copy link
Contributor Author

@b-mehta b-mehta Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like f_T and f_C from (Speeding up pollard's rho method for discrete logarithms, Teske 1998), also described in 2.2.2 and 2.2.3 here, in which experimental performance bounds are described.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In either case, would it be acceptable to take optimisation work like this into a different PR, and merge (something like) this one for now?

Sure. Could you please add a -- TODO comment with your ideas and links? It is easy to lose them in the depth of a closed PR.

@b-mehta
Copy link
Contributor Author

b-mehta commented Sep 12, 2018

This should be good to go. Note that Math.NumberTheory.Moduli.Equations exports solveLinear' - hopefully this is acceptable @Bodigrim.

@Bodigrim Bodigrim changed the title (WIP) Discrete log and Dirichlet characters Discrete log Sep 12, 2018
@Bodigrim Bodigrim merged commit dd59557 into Bodigrim:master Sep 12, 2018
@Bodigrim
Copy link
Owner

Note that Math.NumberTheory.Moduli.Equations exports solveLinear' - hopefully this is acceptable

I've thought about it and decided to revert this new export in 473a5ef. An additional public function needs documentation, tests and is a commitment to support its interface. Being public, solveLinear' should deal somehow with negative modulo and zero modulo, it may also be reasonable to make it working for any Integral a instead of Integer only. That said, from the maintainer's viewpoint it is preferable to use solveLinear only.

@b-mehta
Copy link
Contributor Author

b-mehta commented Sep 16, 2018

Fair enough. Is an acceptable compromise to assert that the modulo must be positive, and that the precondition is not checked, similar to integerSquareRoot' from M.NT.Powers.Squares? That way users can still use solveLinear' for Integers without having to wrap values up using someNatVal.

@Bodigrim
Copy link
Owner

Unchecked preconditions are the root of all evil :) They are the worst plague of arithmoi API in its current state.

Currently I am tying up loose ends for an upcoming release; I'll be glad to return to solveLinear' question afterwards. Please raise an issue to facilitate discussion.

@b-mehta
Copy link
Contributor Author

b-mehta commented Sep 16, 2018

Unchecked preconditions are the root of all evil :) They are the worst plague of arithmoi API in its current state.

I see! Fair enough - I have no real complaints about the state of solveLinear' as it is.

@b-mehta b-mehta deleted the dirichlet-characters branch September 16, 2018 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants